I have to read a CSV every 20 seconds. Each CSV contains min. of 500 to max. 60000 lines. I have to insert the data in a Postgres table, but before that I need to check if the items have already been inserted, because there is a high probability of getting duplicate item. The field to check for uniqueness is also indexed.
So, I read the file in chunks and use the IN clause to get the items already in the database.
Is there a better way of doing it?
This should perform well:
Closely related to this answer.