I need to convert a very huge .bam file to .bed file, although I found a solution by using bedops's bam2bed parallel, that parallel support SEG and gnuParallel, the two clusters I can access only support slurm and torque schedulers, and I do not know much about tcsh, I can even not modify the script to meet the require of slurm and torque.
Due to I know a little about Python, I plan to use Python's multiprocessing module to do this, however, the following code raise a weird message:
"Python quit unexpectedly while using the calignmentfile.so plug-in"
# The code here is just a test code, ignore its real meaning.
import multiprocessing as mp
import pysam
def work(read):
return read.query
# return read.split()[0]
if __name__ == '__main__':
cpu = mp.cpu_count()
pool = mp.Pool(cpu)
sam = pysam.AlignmentFile('foo.bam', 'rb')
read = sam.fetch(until_eof=True)
# f = open('foo.text', 'rb')
# results = pool.map(work, f, cpu)
results = pool.map(work, read, cpu)
print(results)
Does this message mean the reads from pysam.AlignmentFile() does not support parallelism, or Python doesn't support this kind of parallel? I use a regular text file test this piece of code, it works well (e.g. the code was commented).
pysam
indeed has some problems with concurrency. If you see the source code forfetch
you'll see there's a problem with concurrency and iterating it's return types