In this post we'll discuss how to make your FTP folder more valuable by adding search capabilities to it. By the end of the post you will be able to do a fuzzy search over content and meta of documents in your FTP folder. All the magic will be done by Ambar - document search engine, so it's required to install it before proceeding (Installation Instructions).
- Up And Running Ambar CE or EE
- Running FTP server, this example uses public Debian FTP
Setting Up FTP Crawler
Let's setup Ambar to crawl an FTP folder. Go to the
Settings page and create a new crawler by clicking the pink
+ button in the bottom-right corner.
You'll see the JSON file with a new crawler settings, it looks frightening, but actually it's very simple and comprehensive.
Next fields are required:
id- id of the new crawler, it should contain only alphanumeric symbols, "DebianPublicFtp" in my case
description- description of the new crawler ("Public Debian FTP Server")
locations- the locations for the crawler to crawl, there can be one or several. Each location should have a host name or ip address of the FTP server and the path to the location to be crawled. In my case,
host_nameis "ftp.debian.org", so I can omit
ip_address. "/debian/doc/" is the
credentials- specify the login and password for the FTP server,
schedule- specify the cron-style schedule, set
trueto enable the schedule.
verbose- set logging level for the crawler. It's highly recommended to set it to
falsefor locations with lots of files
All other fileds are optional. Use the image above to help yourself setting it up.
CREATE button. The created crawler will appear in the list. Hit
ENQUEUE to start crawling and watch the process.
When Ambar finishes crawling you'll see a yellow "done" line in the logs.
When FTP crawler is adding files to Ambar, Ambar tries to extract the contents from every file and does OCR if needed. You can watch the processing state on the Statistics page. As Ambar processes files they become searchable.
Search FTP Folder
To search throught your FTP folder go to Search page and select your FTP folder as a source.
Ambar advanced search capabilitites are at your disposal! Search for
* to show all the files. Here you can find detailed Ambar query syntax description.
Congrats! You've just setted up FTP folder crawling with Ambar. Instructions on how to setup other types of crawlers you can find here.
Stay tuned and subscribe for our blog!