Error while crawling path\to\file_folder: java.net.ConnectException: Connection timed out: connect

I am trying to ingest the remote server files using FSCrawler into the existing index of Elasticserach(which is on my local machine) but getting above exception.

Below is the _settings.yml file of FSCrawler:

 ---
    name: "index_in_es_onefsc"
    server:
      hostname: "machinename.abc.com"
      port: 22
      username: "username"
      password: "password@20"
      protocol: "ssh"
    fs:
      url: "E:\\TestFilesToBeIndexed"
      update_rate: "15m"
      excludes:
      - "*/~*"
      json_support: false
      filename_as_id: false
      add_filesize: true
      remove_deleted: true
      add_as_inner_object: false
      store_source: false
      index_content: true
      attributes_support: false
      raw_metadata: false
      xml_support: false
      index_folders: true
      lang_detect: false
      continue_on_error: false
      ocr:
        language: "eng"
        enabled: true
        pdf_strategy: "ocr_and_text"
      follow_symlinks: false
    elasticsearch:
      nodes:
      - url: "http://127.0.0.1:9200"
      bulk_size: 100
      flush_interval: "5s"
      byte_size: "10mb"
1

There are 1 best solutions below

4
On

The documentation says that on Windows when doing SSH from and to a Windows machine you must use the following form:

I think that on Windows, you need to use:

name: "index_in_es_onefsc"
fs:
  url: "/E:/TestFilesToBeIndexed"
server:
  hostname: "machinename.abc.com"
  port: 22
  username: "username"
  password: "password@20"
  protocol: "ssh"

Note that there is a known issue when running FSCrawler from a Windows machine. This has been fixed but in case you are using an older SNAPSHOT version than the one published on June 26th, you'll most likely need to upgrade.