ElasticSearch throws string to date Conversion error when importing csv with Logstash

716 Views Asked by At

I'm trying to run a simple csv file with Logstash to ElasticSearch.

But when I run it, I get the following error for converting the string to a date format (the first Date column).

"error"=>{
"type"=>"mapper_parsing_exception", 
"reason"=>"failed to parse [Date]",
"caused_by"=>{
    "type"=>"illegal_argument_exception", 
    "reason"=>"Invalid format: \"Date\""}}}}

When I remove the Date column, all works great.

I'm using the following csv file:

Date,Open,High,Low,Close,Volume,Adj Close
2015-04-02,125.03,125.56,124.19,125.32,32120700,125.32
2015-04-01,124.82,125.12,123.10,124.25,40359200,124.25
2015-03-31,126.09,126.49,124.36,124.43,41852400,124.43
2015-03-30,124.05,126.40,124.00,126.37,46906700,126.37

and the following logstash.conf:

input {
  file {
    path => "path/file.csv"
    type => "core2"
    start_position => "beginning"    
  }
}
filter {
  csv {
      separator => ","
      columns => ["Date","Open","High","Low","Close","Volume","Adj Close"]
  }
  mutate {convert => ["High", "float"]}
  mutate {convert => ["Open", "float"]}
  mutate {convert => ["Low", "float"]}
  mutate {convert => ["Close", "float"]}
  mutate {convert => ["Volume", "float"]}
  date {
    match => ["Date", "yyyy-MM-dd"]
    target => "Date"
  }
}
output {  
    elasticsearch {
        action => "index"
        hosts => "localhost"
        index => "stock15"
        workers => 1
    }
    stdout {}
}

Seems I'm handling the Date fine. Any idea what could have gone wrong?

Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

Thanks @Yeikel, I ended up changing the logstash config and not the data itself.

Before applying the csv filter, I examine with regex to see wether it is the header. So if it is the header I drop it, and continue to the next line (that will be handled with the csv filter)

Please see the updated config that solves the header issue:

input {  
  file {
    path => "path/file.csv"
    start_position => "beginning"    
  }
}
filter {  
    if ([message] =~ "\bDate\b") {
        drop { }
    } else {
        csv {
            separator => ","
            columns => ["Date","Open","High","Low","Close","Volume","Adj Close"]
        }
        mutate {convert => ["High", "float"]}
        mutate {convert => ["Open", "float"]}
        mutate {convert => ["Low", "float"]}
        mutate {convert => ["Close", "float"]}
        mutate {convert => ["Volume", "float"]}
      date {
        match => ["Date", "yyyy-MM-dd"]
      }
    }
}
output {  
    elasticsearch {
        action => "index"
        hosts => "localhost"
        index => "stock15"
        workers => 1
    }
    stdout {
        codec => rubydebug
     }
}
0
On

The problem is in the file itself. Logstash is reading the first line and it is unable to parse :

Date,Open,High,Low,Close,Volume,Adj Close

Right not the solution it to remove the headers of the file :

2015-04-02,125.03,125.56,124.19,125.32,32120700,125.32
2015-04-01,124.82,125.12,123.10,124.25,40359200,124.25
2015-03-31,126.09,126.49,124.36,124.43,41852400,124.43
2015-03-30,124.05,126.40,124.00,126.37,46906700,126.37

And it should be okay.

There is an issue about this at GitHub : https://github.com/elastic/logstash/issues/2088