Error message when use Python multiprocessing

535 Views Asked by At

I need to convert a very huge .bam file to .bed file, although I found a solution by using bedops's bam2bed parallel, that parallel support SEG and gnuParallel, the two clusters I can access only support slurm and torque schedulers, and I do not know much about tcsh, I can even not modify the script to meet the require of slurm and torque.

Due to I know a little about Python, I plan to use Python's multiprocessing module to do this, however, the following code raise a weird message:

"Python quit unexpectedly while using the calignmentfile.so plug-in"

# The code here is just a test code, ignore its real meaning.
import multiprocessing as mp
import pysam

def work(read):
    return read.query
    # return read.split()[0]

if __name__ == '__main__':
    cpu = mp.cpu_count()
    pool = mp.Pool(cpu)

    sam = pysam.AlignmentFile('foo.bam', 'rb')
    read = sam.fetch(until_eof=True)

    # f = open('foo.text', 'rb')
    # results = pool.map(work, f, cpu)

    results = pool.map(work, read, cpu)
    print(results)

Does this message mean the reads from pysam.AlignmentFile() does not support parallelism, or Python doesn't support this kind of parallel? I use a regular text file test this piece of code, it works well (e.g. the code was commented).

1

There are 1 best solutions below

0
On

pysam indeed has some problems with concurrency. If you see the source code for fetch you'll see there's a problem with concurrency and iterating it's return types