I am new to Python, I am just wondering if the following can be easily achieved with Web2Py.
Background:
I have a web form which users can upload a zip file and the webapp will process the files inside and display the result. Now everything is working fine, but the way I did the webapp is single threading, i.e. The webapp will unzip the zip file and process the file(s) inside one by one and display the result when all files are process.
I use a third party script to process the files, which will not prompt anything when the job is finished. It will only create a result file on disk.
Since each file takes at least 2 - 5 minutes to process, and I have a cloud server with 12 vCPUs, the way I am doing now is extremely inefficient.
Goal:
I want to have it running in multi-threads, i.e. put all the files in a queue and process them in batch in parallel.
Problem:
Say the maximum number of workers is set to 12. If a user uploaded less than 12 files, then it should create a worker for each file and process them in parallel.
I have done this part using threading. The result page will refresh every 5 minutes, if the user wants to wait on page for the result (or the user can come back at anytime to see the result in his account). The webapp simply checks on the disk to see if a result file exists, if it is then just displays it.
How can I achieve:
If a user uploaded more than 12 files (say 30 files for example). 12 workers should be created, and process the first 12 files. Whenever a work finished processing the file, a new worker will be created to take a file from the queue and process. Is there a way that as soon as the worker finishes running the script, it creates a new worker to run?
Since I am new to Python (new to programming, actually), a simple way to achieve that will be much appreciated. Thank you.
Since you're processing the files in a third party script this is not related with web2py.
Consider
files_count
is the number of files your application needs to process. Then you can use the following condition:Now
threads_count
holds the number of threads you're able to launch, with a maximum of 12. Next, create a queue (which is thread-safe) that will be consumed by each thread/worker, and populate it with the file names.Finally, since you already have your
worker
function, make the necessary changes to receive theq
parameter.You don't know in advance how many files will be processed by each worker. As soon as one worker is available, it'll immediately consume a new task from the queue or exit if the queue is empty.
Hope it helps!