How to choose between multiprocessing and multitasking in python for file process?

116 Views Asked by At

I looked at other StackExchange threads related to this topic but seems I need further assistant in understanding.

Please take look at the following scenario? and do explain which method to be used and why?

I have written the Python Code already which Loads the folder and Extracts the file.txt then calls the function "File_Processing" which processes the individual file and then saves the plot after plotting x and y. Thus it takes 20 min per 100 files. I have several folders containing 3000 files per folder.

Now my question is which method to be used, multiprocessing or multitasking and why?

1

There are 1 best solutions below

0
On BEST ANSWER

Check out multiprocessing, it is a standard module: https://docs.python.org/3/library/multiprocessing.html

What you need is almost exactly as in the most basic example:

from glob import glob
from multiprocessing import Pool

list_of_filenames = glob("/path/to/files/*.txt")

def f(filename):
    ...  # do contents of your for loop

if __name__ == "__main__":
    with Pool(5) as p:
        p.map(f, list_of_filenames)

Do not forget the if __name__ == "__main__":, I remember not having it may lead to some weird bugs.