Terminals being renamed None with Biopython Phylo

193 Views Asked by At

I have been using Biopython to align some amino acid sequences with Clustal-Omega, then import the tree generated.

from Bio.Align.Applications import ClustalOmegaCommandline
from Bio import AlignIO
from Bio import Phylo

clustalomega_cline = ClustalOmegaCommandline('/path/to/clustalo', infile=in_file, \
    outfile=out_file, log = log_file, guidetree_out = guidetree_file, verbose=True, \
    auto=True, force=True)
clustalomega_cline()
align = AlignIO.read(out_file, "fasta")
tree = Phylo.read(guidetree_file, "newick")
Phylo.draw(tree)

print [record.id for record in align if record.id  not in \
        [terminal.name for terminal in tree.get_terminals()]]

>['CTX-M-3', 'CTX-M-4', 'CTX-M-5', 'CTX-M-11', 'CTX-M-15', 'CTX-M-133']

print [terminal.name for terminal in tree.get_terminals() if \
        terminal.name == None]

>[None, None, None, None, None, None]

So the imported tree now has some leaves/terminals named None, and is missing an equivalent number of named leaves.

I tried looking in the file at the tree (as formatted by clustalo) and noticed that the genes which are being renamed none always had -0 after them eg:

,
(
(
CTX-M-4:-0
,
CTX-M-5:-0
):0.00171644
,
CTX-M-76:0.00171644
):0.00432852

What do the -0s mean, and how do I fix this so that all my terminals are named?

As a side note, it doesn't seem to be happening when I fill my fasta files with DNA sequences instead to align, and import that tree.

0

There are 0 best solutions below