Replace concatenation with join

237 Views Asked by At

I have code which is given below:

for pileupcolumn in samfile.pileup(max_depth = 1000000) :
  X.append(pileupcolumn.n)
  for pileupread in pileupcolumn.pileups:
     if (pileupread.alignment.mapping_quality <= 15):
             continue      
     if not pileupread.is_del and not pileupread.is_refskip:
             if pileupread.alignment.query_qualities[pileupread.query_position] < 30:
             # Skip entries with base phred scores < 10
                  continue
             if pileupread.alignment.is_reverse: #negative
                  ReverseList[pileupcolumn.pos] += pileupread.alignment.query_sequence[pileupread.query_position]
             else:
                  ForwardList[pileupcolumn.pos] += pileupread.alignment.query_sequence[pileupread.query_position]

The above code is taking a lot of time and I want to replace concatenation in the 11th and 13th line with join. Is there any way to do so?

1

There are 1 best solutions below

16
On

Instead of concatenating, collect the values into a list, then join the list at the end of the loop.

for pileupcolumn in samfile.pileup(max_depth = 1000000) :
  X.append(pileupcolumn.n)
  forward = []
  reverse = []
  for pileupread in pileupcolumn.pileups:
     if (pileupread.alignment.mapping_quality <= 15):
             continue      
     if not pileupread.is_del and not pileupread.is_refskip:
             if pileupread.alignment.query_qualities[pileupread.query_position] < 30:
             # Skip entries with base phred scores < 10
                  continue
             if pileupread.alignment.is_reverse: #negative
                  reverse.append(pileupread.alignment.query_sequence[pileupread.query_position])
             else:
                  forward.append(pileupread.alignment.query_sequence[pileupread.query_position])
  ReverseList[pileupcolumn.pos] += ''.join(reverse)
  ForwardList[pileupcolumn.pos] += ''.join(forward)

I still use concatenation at the end, because this optimization only works for the for pileupread loop. If different pileupcolumn objects have the same pos, we need to concatenate at that point. We also need this concatenation if the ReverseList elements already have values before this code runs.