How to handle huge data(>10 Million) from DB (postgres) to file in NodeJS

1.3k Views Asked by At

I have to query over 10 million records from tables (using multiple joins here - typeorm) and then send this to another microservice which will put all this data in a CSV file after formatting it a bit.

Initially when we developed the API to handle this, we just queried the tables and sent all the data via an API to the other microservice. Worked for 100,000 records but timed out for these massive data we kept getting recently. How do I handle this!!

The microservice and API are written in NodeJS, DB is PSQL.

We are thinking of streaming all this in chunks and building the file. Is there any other better approach. Basically we want to improve performance by querying this huge data and put it into a file. Would appreciate any inputs!

1

There are 1 best solutions below

1
Cosmin Ioniță On

You can create a very simple app that reads data from postgresql in a streaming fashion and then creates batches of rows that can be processed in parallel.

Instead of sending each batch directly to the other microservice (via HTTP), you can upload that batch into S3 and send a message on a SQS queue.

This way you decouple the components and make the overall system scalable. This article describes how this would work.

The microservice will first read the SQS message, which contains the path of the file stored in S3, then it downloads the file from S3 and transforms it into a CSV format. This workload can also be parallelized easily.

If you want the output to be a single CSV file, you can merge it asynchronously after the individual CSV files derived from batches have been created.