I am using the Python package ete3
. I have trees such as:
((Species1_order1,(Species2_order2,Species3_order2)),Species4_order3,Species5_order5);
I would like to see the most closely related leaf to a specific node in the tree (here the tree is Species1_order1
).
In the example, the most closely related leaves are Species2_order2
/ Species3_order2
, and Species4_order3
/Species5_order5
.
Code:
tree = ete3.Tree('((Species1_order1, \
(Species2_order2, Species3_order2)), \
Species4_order3, Species5_order5);')
New example :
tree=ete3.Tree('((((((A,B),C),D),(E,F)),G),(H,I));')
The result I get is :
A B C D E F G H I
A 0.0 2.0 3.0 4.0 6.0 6.0 6.0 8.0 8.0
B 2.0 0.0 3.0 4.0 6.0 6.0 6.0 8.0 8.0
C 3.0 3.0 0.0 3.0 5.0 5.0 5.0 7.0 7.0
D 4.0 4.0 3.0 0.0 4.0 4.0 4.0 6.0 6.0
E 6.0 6.0 5.0 4.0 0.0 2.0 4.0 6.0 6.0
F 6.0 6.0 5.0 4.0 2.0 0.0 4.0 6.0 6.0
G 6.0 6.0 5.0 4.0 4.0 4.0 0.0 4.0 4.0
H 8.0 8.0 7.0 6.0 6.0 6.0 4.0 0.0 2.0
I 8.0 8.0 7.0 6.0 6.0 6.0 4.0 2.0 0.0
But for instance E and F have an equaly distance to A,B,C and D in the tree and in the result they appear to be clother to D.
A good matrix result should rather be :
A B C D E F G H I
A 0 1 2 3 4 4 5 6 6
B 1 0 2 3 4 4 5 6 6
C 2 2 0 3 4 4 5 6 6
D 3 3 3 0 4 4 5 6 6
E 4 4 4 4 0 1 5 6 6
F 4 4 4 4 1 0 5 6 6
G 5 5 5 5 5 5 0 6 6
H 6 6 6 6 6 6 6 0 1
I 6 6 6 6 6 6 6 1 0
is not it ?
As discussed in the comments,
ete3
gives us a function calledTree.get_closest_leaf
, but it's output is not what is expected (and I am not sure what this value even represents here):Instead, you can get the node distance like this:
Note: This is a suboptimal solution for several reasons, but this question is not asking for the most most efficient solution. see this link for much more information about phylogenetic distance matrix methods.
This solution also uses
pandas
which is overkill, since it is really just for the convenience of row/column labels. It would not be difficult to remove thepandas
dependencies and do it with native lists instead.Here is the output:
For the updates posted, I am not seeing anything wrong. It appears to give correct results. Here is the tree as rendered by ete3 (I highlighted the 4 hops that are counted in the distance from
Interest_sequence
toRhopalosiphum_maidis_Hemiptera
):and here is the matrix column for
Interest_sequence
that corresponds to it: