I have built a pipeline where I am fetching data from MySQL database in batches in every iteration until I go through entire dataset.
offset = 0
while True:
await cursor.execute("select * from candidate limit 100 offset '{}'".format(offset))
data = await cursor.fetchall()
if len(data) == 0:
break # break until there is nothing to return from candidate table
# perform some operations on this data
# processed data is written to NoSQL database
# increment offset for next batch
offset += 1
Currently, this operation is sequential, that mean each batch is processed one by one, and it's causing some issue with latency. Can anyone help me with parallelizing this.
How can I execute three to four sets in parallel and break once the entire table data is processed? Please provide some code examples (or) pseudocode to understand the logic, so I can do it properly.