fscrawler gives three javascript errors

240 Views Asked by At

I'm new to Elasticsearch and have been trying to use the ingest plugin (I have posted a couple of questions about that). It has been suggested that for what I am trying to do I should be using Fscrawler. I'm using Elasticsearch 5.5.1 and I've installed Fscrawler 2.3. I have java 8.0.1 installed and I have created an environement variable 'JAVA_HOME' pointing to the Java directory. Using Kibana I have created the following:

PUT _ingest/pipeline/docs 
{
  "description": "documents",
  "processors" : [
    {
     "attachment" : {
        "field": "data",
        "indexed_chars" : -1
      }
    }]
}
PUT myindex
{
  "mappings" : {
    "documents" : {
      "properties" : {
        "attachment.data" : {
          "type": "text",
          "analyzer": "standard"
        }
      }
    }
  }
}

in my _settings file for Fscrawler I have set the url to my documents folder and within the elaasticsearch section I have included "index" : "myindex"

Using a powershell command .\fscrawler mydocs --loop 1

Below is the output from the command.

enter image description here

here is my _settings.json file for fscrawler

{
  "name" : "docs",
  "fs" : {
    "url" : "w:\\Elasticsearch\\Docs",
    "update_rate" : "15m",
    "excludes" : [ "~*" ],
    "json_support" : false,
    "filename_as_id" : false,
    "add_filesize" : true,
    "remove_deleted" : true,
    "add_as_inner_object" : false,
    "store_source" : false,
    "index_content" : true,
    "attributes_support" : false,
    "raw_metadata" : true,
    "xml_support" : false,
    "index_folders" : true,
    "lang_detect" : false,
    "continue_on_error" : false,
    "pdf_ocr" : true
  },
  "elasticsearch" : {
    "nodes" : [ {
      "host" : "127.0.0.1",
      "port" : 9200,
      "scheme" : "HTTP"
    } ],
    "index" : "myindex",
    "bulk_size" : 100,
    "flush_interval" : "5s",
    "username" : "elastic",
    "password" : "changeme"
  },
  "rest" : {
    "scheme" : "HTTP",
    "host" : "127.0.0.1",
    "port" : 8080,
    "endpoint" : "fscrawler"
  }
}
1

There are 1 best solutions below

1
On

It’s better not to include screenshots but copy and paste the logs.

Then:

  • You don’t need to define an ingest pipeline
  • What does your fscrawler settings look like?
  • There is a warning about an old FSCrawler version. Were you using 2.2 before?