Is there any way to improve to_sql performance in Python?

306 Views Asked by Catarina Ribeiro At 20 April 2022 at 11:54

I have a loop that goes through each file (xlsx and csv) on a directory, reads them, append them and concatenate it in list. That list is them sent to a database. It works, but hen it comes to bigger files, it can take a lot of time. I figure this is the part where I need to improve my code:

        xls.to_sql(table, con=engine, if_exists='append', index=False)

I've been testing different chunksizes, but how can I choose the optimal chunksize when there's all kinds of files' sizes?

I've also been studying parallelization and was wondering if anyone could help me figure out how to improve this.

EDIT: By the way, I've also followed this blogpost to improve my connection to the database and all. And I've noticed a lot of perfomance improvements. But to_sql is still lacking performance.

https://towardsdatascience.com/dramatically-improve-your-database-inserts-with-a-simple-upgrade-6dfa672f1424

Original Q&A

There are 1 best solutions below

NYC Coder On 20 April 2022 at 12:28

This is what we did in our project:

chunk_size = 2100 // len(df.columns) - 1

And used method = 'multi' in to_sql as well. This has worked out pretty well for us. method = 'multi' speeds things up quite a bit as it takes multiple insert statements at once.

Is there any way to improve to_sql performance in Python?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PARALLEL-PROCESSING

Related Questions in CHUNKS

Related Questions in PANDAS-TO-SQL

Trending Questions

Popular # Hahtags

Popular Questions