I am have a file and I want to process it in a parallelized manner using Python's multiprocessing class. My current code is:
class rand:
def __init__(self):
self.rando = "world"
def do_work2(obj, line):
return line + obj.rando
if __name__ == "__main__":
num_workers = cpu_count() - 2
pool = Pool(num_workers)
ran = rand()
with open("sample.txt") as f:
# chunk the work into batches of 4 lines at a time
results = pool.starmap(do_work2, zip(ran,f), 4)
print(results)
I expect to see all the lines in my file with a concatenated "world" in the end. However when I run this code I get:
TypeError: 'rand' object is not iterable
I get why it is happening, but I am just wondering if there is a way by which I can send class objects to a function and then use class object inside that function, all this while multiprocessing.
Can someone help me please ?
As Michael notes, the error is coming about because
zipexpects that each of its arguments are iterable, while yourrandclass is not. While Chems' fix works, it needlessly takes up memory, and doesn't account for how large the file is. I'd prefer this way:repeatproduces an infinite number ofranobjects (until you quit asking for them). This means that it will produce as manyrans asfhas lines, without taking up memory in a separate list before being given tozip, and without needing to calculate how many linesfhas.I'd just scrap using
pool.starmapand use normalpool.mapthough. You can wrap your function in another function, and supplyranas the first argument. There's two ways of doing this. The quick-and-dirtylambdaway:Or, the arguably more correct way of using
partial:See here for why you may want to prefer
partialto a plainlambda.