How to use MSA and Clustal for python inside a Jupyter notebook?

368 Views Asked by At

I have a FASTA file with sequences associated with states and their cites. Is it possible to use python through Jupyter notebook to run a MSA and clustal, then create a phylogenetic tree with the align sequence. I am not sure where to start and there was no clear direction when I was given the assignment.

1

There are 1 best solutions below

0
On

Disclaimer: I have no background in biology.

As far as I understand, the FASTA format contains a sequence of letters and aligning means finding if sequence #1 contains or partially overlaps with sequence #2. That's string manipulation, which Python is very good at. You need to write a function that takes 2 strings and returns what you need.

I found a library on Github, which seems to do this, I don't know if using it is permitted in your case. The following code fragment is taken from the documentation. https://github.com/benchling/clustalo-python

from clustalo import clustalo
input = {
    'seq1': 'AAATCGGAAA',
    'seq2': 'CGGA'
}
aligned = clustalo(input)
# aligned is a dict of aligned sequences:
#   seq1: AAATCGGAAA
#   seq2: ----CGGA--

Once you can estimate sequence similarities, you can display them in order.

You can draw inside a Jupyter notebook, an example can be seen here: Using Turtle in Google Colab. Or you could display the tree in text format, using spaces, tabs, etc. to format the tree.