I crawled a lot of JSON files in data folder, which all named by timestamp (./data/2021-04-05-12-00.json, ./data/2021-04-05-12-30.json, ./data/2021-04-05-13-00.json, ...).
Now I'm tring to use ELK stack to load those increasing JSON files.
The JSON file is pretty printed like:
{
"datetime": "2021-04-05 12:00:00",
"length": 3,
"data": [
{
"id": 97816,
"num_list": [1,2,3],
"meta_data": "{'abc', 'cde'}"
"short_text": "This is data 97816"
},
{
"id": 97817,
"num_list": [4,5,6],
"meta_data": "{'abc'}"
"short_text": "This is data 97817"
},
{
"id": 97818,
"num_list": [],
"meta_data": "{'abc', 'efg'}"
"short_text": "This is data 97818"
},
],
}
I tried using logstash multiline plugins to extract json file, but it seems like it will handle each file as an event. Is there any way to extract each record in JSON data fileds as an event ?
Also, what's the best practice for loading multiple increasing pretty-printed JSON files in ELK ?
Using multiline is correct if you want to handle each file as one input event.
Then you need to leverage the
splitfilter in order to create one event for each element in thedataarray:So Logstash reads one file as a whole, it passes its content as a single event to the filter layer and then the
splitfilter as shown above will spawn one new event for each element in thedataarray.