Tuning Your Ambar: Mastering Config

WARNING: This description is valid for Ambar 1.3.0 and later

Let's dive into Ambar config file and describe its fields. You can finely tune your Ambar with it depending on your hardware and environment. Some fields are only available in the enterprise version, so we'll omit them.

  • analyticsToken - analytics token, used to improve user experience and track errors. Remove it if you are not willing to share your experience with Ambar Team
  • auth - api authentication, basic or none
  • mode - Ambar running mode, ce by default
  • defaultLangAnalyzer - language analyzer for default user, ambar_en for English or ambar_ru for Russian, ambar_it for Italian, ambar-de for German and ambar-cjk for CJK
  • pipelineCount - number of pipelines, 1 is maximum for the community edition
  • crawlerCount - max. number of simultaneously running crawlers, 1 is maximum for the community edition
  • nerEnabled - enables/disabled named entities recognition and extraction
  • dbCacheSizeGb - cache size for MongoDB core
  • uiLang - UI language, en or ru
  • esHeapSize - JVM max memory amount for ES to use, it's highly recommended to set it to 4g or more when indexing large files
  • ocrPdfMaxPageCount - max number of pages in pdf files to perform ocr on
  • ocrPdfSymbolsPerPageThreshold - the threshold for deciding whether to perform ocr on page or not. If the number of alphabetic symbols extracted from the page is less than the threshold, the page will be rendered and ocr will be performed
  • dropboxClientId - only supported in enterprise and cloud versions
  • dropboxredirectUri - only supported in enterprise and cloud versions
  • host - host where your Ambar runs
  • protocol - Ambar UI and API protocol, Ambar CE supports only http
  • port - Ambar port
  • dockerRepo - docker registry to pull the images from
  • dataPath - path to the folder where Ambar stores its data (documents and indexes)
  • preserveOriginals - store original files in Ambar or not. Can't be changed after initial run.

Keep in mind that you have to restart Ambar (sudo ./ambar.py restart) to apply your config changes.
Default config for the community edition looks like this:

    "analyticsToken": "cda4b0bb11a1f32aed7513b08c455922",
    "auth": "basic",
    "mode": "ee",
    "defaultLangAnalyzer": "ambar_en",
    "pipelineCount": 1,
    "crawlerCount": 1,
    "nerEnabled": "true",
    "dbCacheSizeGb": 2,
    "uiLang": "en",
    "esHeapSize": "1g",
    "ocrPdfMaxPageCount": 5000,
    "ocrPdfSymbolsPerPageThreshold": 100,
    "dropboxClientId": "",
    "dropboxredirectUri": "",
    "host": "",
    "protocol": "http",
    "port": "80",
    "dockerRepo": "ambar",
    "dataPath": "/opt/ambar",
    "preserveOriginals": "false"

Stay tuned and subscribe for our blog!

Igor S

Read more posts by this author.

Subscribe to Ambar Blog. How we made your docs searchable

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!