How to find all possible consecutive triplets in a string?

574 Views Asked by At

My question is that if you have a string of DNA, how could you create a list of all possible consecutive triplets? For instance, if you have the following string:

ACCTAA

I need to create a list of all possible consecutive triplets, such that:

ACC, CCT, CTA, TAA

How could I accomplish that?

So far, I have only figured out how to create a list of triplets by dividing the string at equal intervals:

list_of_triplet = [dna[i:i+3] for i in range(0, len(dna), 3)]

Where dna is the input string.

Thank you for any suggestions!

2

There are 2 best solutions below

1
On

You're almost there. Let's remove the third parameter in the range function (you don't really want to split the string in groups of three). Also, we want to stop when there are only 3 characters left, so the second parameter should be len(dna) - 2. With all this, you have:

list_of_triplet = [dna[i:i+3] for i in range(0, len(dna) - 2)]

If you don't want the triplets to be repeated, you can instead use a set comprehension:

list_of_triplet = {dna[i:i+3] for i in range(0, len(dna) - 2)}
0
On

You have multitude of options

With iterator

unique_triplets = set(dna[i:i+3] for i in range(len(dna) - 2)
print(unique_triplets)
# {'ACC', 'TAA', 'CTA', 'CCT'}

With iterating

unique_triplets = set()
for i in range(len(dna) - 2):
    unique_triplets.add(dna[i:i+3])
print(unique_triplets)
# {'ACC', 'TAA', 'CTA', 'CCT'}

If you want counts of the values use defaultdict.

from collections import defaultdict
unique_triplets = defaultdict(int)
for i in range(len(dna) - 2):
    unique_triplets[dna[i:i+3]] += 1

print(unique_triplets)
# defaultdict(<class 'int'>, {'ACC': 1, 'CCT': 1, 'CTA': 1, 'TAA': 1})