When running the following df.to_csv()
line as part of a multiprocessing
process:
filtered_chunk.to_csv(os.path.join(output_dir, "filtered_{}".format(csv_filename)), mode='a+')
I get the following error. I can't figure out what the issue is (the snippet below is for one multiprocessing
process):
Process ForkPoolWorker-13:
Traceback (most recent call last):
File "/home/User1/miniconda3/envs/part_ii_dev-conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
File "/home/User1/miniconda3/envs/part_ii_dev-conda/lib/python3.8/multiprocessing/process.py", line 108, in run
File "/home/User1/miniconda3/envs/part_ii_dev-conda/lib/python3.8/multiprocessing/pool.py", line 109, in worker
<insert stacktrace for code I wrote...>
filtered_chunk.to_csv(os.path.join(output_dir, "filtered_{}".format(csv_filename)), mode='a+')
File "/home/User1/miniconda3/envs/part_ii_dev-conda/lib/python3.8/site-packages/pandas/core/generic.py", line 3170, in to_csv
File "/home/User1/miniconda3/envs/part_ii_dev-conda/lib/python3.8/site-packages/pandas/io/formats/csvs.py", line 185, in save
File "/home/User1/miniconda3/envs/part_ii_dev-conda/lib/python3.8/site-packages/pandas/io/common.py", line 418, in get_handle
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 971, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 914, in _find_spec
File "<frozen importlib._bootstrap_external>", line 1342, in find_spec
File "<frozen importlib._bootstrap_external>", line 1314, in _get_spec
File "<frozen importlib._bootstrap_external>", line 1443, in find_spec
File "<frozen importlib._bootstrap_external>", line 1483, in _fill_cache
OSError: [Errno 127] Key has expired: '/home/User1/miniconda3/envs/part_ii_dev-conda/lib/python3.8'
After every process fails with the stacktrace above, I get a Bus error (core dumped)
printout.
"Key has expired" makes me think of something that requires verification or provides time-limited access (file pointers? keys/tickets to access a server?). I ran the script in a tmux
session on a server, so I don't think access to the server would've been an issue (some googling of error 127 pointed me to mounting issues but I don't think that's an issue here). Since the script failed at the get_handle
method with basically returns a file handle, file pointers seems to be the problem but I'm not sure why.
If it helps, I'm using multiprocessing
to parallelise a function, but different processes write to different csv files, so that shouldn't be an issue here.