How to split a large CSV file into multiple JSON files using the Miller command line tool?

499 Views Asked by At

I am currently using this Miller command to convert a CSV file into a JSON array file:

mlr --icsv --ojson --jlistwrap cat sample.csv > sample.json

It works fine, but the JSON array is too large.

Can Miller split the output into many smaller JSON files of X rows each?

For example if the original CSV has 100 rows, can I modify the command to output 10 JSON Array files, with each JSON array holding 10 converted CSV rows?

Bonus points if each JSON Array can also be wrapped like this:

{
  "instances": 

//JSON ARRAY GOES HERE

}
1

There are 1 best solutions below

0
aborruso On BEST ANSWER

you could run this

mlr --c2j --jlistwrap put -q '
  begin {
    @batch_size = 1000;
  }
  index = int(floor((NR-1) / @batch_size));
  label = fmtnum(index,"%04d");
  filename = "part-".label.".json";
  tee > filename, $*
' ./input.csv

You will have a file named part-00xx every 1000 record.