Crawling and Searching FTP Folder With Ambar

In this post we'll discuss how to make your FTP folder more valuable by adding search capabilities to it. By the end of the post you will be able to do a fuzzy search over content and meta of documents in your FTP folder. All the magic will be done by Ambar - document search engine, so it's required to install it before proceeding (Installation Instructions).

Requirements:

  • Up And Running Ambar CE or EE
  • Running FTP server, this example uses public Debian FTP

Setting Up FTP Crawler

Let's setup Ambar to crawl an FTP folder. Go to the Settings page and create a new crawler by clicking the pink + button in the bottom-right corner.
You'll see the JSON file with a new crawler settings, it looks frightening, but actually it's very simple and comprehensive.

Setup FTP crawler with Ambar

Next fields are required:

  • id - id of the new crawler, it should contain only alphanumeric symbols, "DebianPublicFtp" in my case
  • description - description of the new crawler ("Public Debian FTP Server")
  • type - ftp
  • locations - the locations for the crawler to crawl, there can be one or several. Each location should have a host name or ip address of the FTP server and the path to the location to be crawled. In my case, host_name is "ftp.debian.org", so I can omit ip_address. "/debian/doc/" is the location.
  • credentials - specify the login and password for the FTP server, auth_type is "basic".
  • schedule - specify the cron-style schedule, set active to true to enable the schedule.
  • verbose - set logging level for the crawler. It's highly recommended to set it to false for locations with lots of files

All other fileds are optional. Use the image above to help yourself setting it up.

Tap CREATE button. The created crawler will appear in the list. Hit ENQUEUE to start crawling and watch the process.
When Ambar finishes crawling you'll see a yellow "done" line in the logs.

Running Ambar FTP crawler

When FTP crawler is adding files to Ambar, Ambar tries to extract the contents from every file and does OCR if needed. You can watch the processing state on the Statistics page. As Ambar processes files they become searchable.

Processing files with Ambar

Search FTP Folder

To search throught your FTP folder go to Search page and select your FTP folder as a source.

Selecting a source

Ambar advanced search capabilitites are at your disposal! Search for * to show all the files. Here you can find detailed Ambar query syntax description.

Search with Ambar thorught your FTP folder

That's it!

Congrats! You've just setted up FTP folder crawling with Ambar. Instructions on how to setup other types of crawlers you can find here.

Stay tuned and subscribe for our blog!

Ilya P

Read more posts by this author.

Subscribe to Ambar Blog. How we made your docs searchable

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!