How to execute python with sql query pipeline in parallel using concurrent.futures or multiprocessing?

369 Views Asked by At

I have built a pipeline where I am fetching data from MySQL database in batches in every iteration until I go through entire dataset.

offset = 0
while True:
    
   await cursor.execute("select * from candidate limit 100 offset '{}'".format(offset))
   data = await cursor.fetchall()

   if len(data) == 0:
       break # break until there is nothing to return from candidate table

   # perform some operations on this data
   # processed data is written to NoSQL database

   # increment offset for next batch
   offset += 1

Currently, this operation is sequential, that mean each batch is processed one by one, and it's causing some issue with latency. Can anyone help me with parallelizing this.

How can I execute three to four sets in parallel and break once the entire table data is processed? Please provide some code examples (or) pseudocode to understand the logic, so I can do it properly.

0

There are 0 best solutions below