Json2csv parser is taking too long time in parsing

1.4k Views Asked by At

I have an API which is connected to AWS lambda which does following:

  1. Getting JSON data from s3. Number of records around 60,000
  2. Using Json2csv library to parse the JSON data to csv string
  3. Putting the csv string result to s3 bucket

Point 2 above is taking too long to parse the JSON data into csv string. The library I am using for it is json2csv: https://www.npmjs.com/package/json2csv

Following is my code:

/// Get data in JSON format in object: records (array of JSON)

let headers = [
    {
      label: "Id",
      value: "id"
    },
    {
      label: "Person Type",
      value: "type"
    },
    {
      label: "Person Name",
      value: "name"
    }
];

let json2csvParser = new Parser({ fields: headers });

console.log("Parsing started");
let dataInCsv = json2csvParser.parse(records);
console.log("Parsing completed");

// PutObject of dataInCsv in s3

It is taking around 20 seconds to parse 60K records. Is there anything I can do to improve the performance here? Any other library? I used to think in memory operations are pretty fast. Why is it that this parsing is slow. Any help please.

1

There are 1 best solutions below

0
On

If you are writing and reading to file you can use this async solution taken from the json2csv package docs.

const { createReadStream, createWriteStream } = require('fs');
const { Transform } = require('json2csv');

const fields = ['field1', 'field2', 'field3'];
const opts = { fields };
const transformOpts = { highWaterMark: 16384, encoding: 'utf-8' };

const input = createReadStream(inputPath, { encoding: 'utf8' });
const output = createWriteStream(outputPath, { encoding: 'utf8' });
const json2csv = new Transform(opts, transformOpts);

const processor = input.pipe(json2csv).pipe(output);

You can replace createReadStream and createWriteStream with the AWS Lambda streams you need, possibly this one