Iterate string to compare an i element to i+1 element python

1.2k Views Asked by At

I have a DNA sequence:

seq='AACGTTCAA'

I want to count how many letters are equal to the next one. In this example I should get 3 (because of AA-TT-AA).

In my first try I found out that this doesn't work, because i is a string and 1 an integer.

seq='AACGTTCAA'
count=[]
for i in seq:
    if i == i+1: #neither i+=1
        count.append(True)
    else: count.append(False)
print(sum(count))  

So I tried this:

seq='AACGTTCAA'
count=[]
for i in seq:
    if i == seq[seq.index(i)+1]:
        count.append(True)
    else: count.append(False)
print(sum(count))  

Then I receive this output which I cannot understand. 3 of these True should be False (1,5,8) Especially 8 as it is the last element of the string.

6
[True, True, False, False, True, True, False, True, True] 

If thought about doing this with arrays but I think there might be a easy way to do this just in strings. Thanks

4

There are 4 best solutions below

0
On BEST ANSWER

To answer your question, the statement for i in seq yields a series of string variables like 'A', 'A', 'C' etc. so when in your first case when you are attempt to compare i == i+1: you are adding 1 to a string variable which throw a TypeError. In your second example, where you execute if i == seq[seq.index(i)+1] gives a false result, since the seq.index(i) always returns the first occurrence of the value. To do what you want on a basic level you can do the following:

def countPairedLetters(seq):
    count = 0
    for i in range(1, len(seq)):
        # i starts with 1 and ends with len(seq)-1
        if seq[i-1] == seq[i]:
            count += 1
    return count    

Note: by starting with the index 1 and going to last, you avoid the issue with overrunning the sequence.

6
On

Using itertools is one way:

from itertools import groupby
seq = 'AACGTTCAA'
print(sum(len(list(g))-1 for k,g in groupby(seq)))

This splits the sequence into groups of consecutive letters, then counts each group's lenght-1 into the total.

Edit: Updated with mozway's comments.

0
On

The reason for unwanted True's is because of seq.index()

index(), would always return the first occurrence of the character you are searching for. When you have 2 consecutive characters, its actually returning the index of the first occurrence of that character and they always match.

here is a quick solution:

seq='AACGTTCAA'
count=[]
for i in range(0,len(seq)-1):
    print(i)
    if seq[i]==seq[i+1]:
        count.append(True)
    else: count.append(False)
print(count) 
7
On

You can do this:

for i in range(0, len(seq)):
   if seq[i] == seq[i+1]: # <- this causes an error
      count.append(True)

Though you have to check if seq[i+1] does not cause an error.

Update

count = 0
for i in range(0, len(seq)-1): # this prevents an error
   if seq[i] == seq[i+1]:
      count += 1