Bio.Phylo.PAML.codeml's results parser quietly fails to read all the data

697 Views Asked by At

Biopython comes with methods to interface with the PAML package for phylogenetic analysis.

In particular I am using Bio.Phylo.PAML to run analyses using PAML's codeml.exe program which in my case does Ka/Ks (dN/dS) ratio analysis on pairs of orthologous gene sequences.

After running the analysis using results = cml.run() I can see it has successfully generated result.out files that look about right. Most importantly the final line of the file is what I need to parse into Python:

t= 0.2173  S=   703.9  N=  1489.1  dN/dS=  0.2247  dN = 0.0344  dS = 0.1529

What I am most interested in is dN/dS = 0.2247

According to Biopython's PAML wiki this value can be obtained from Python by doing results = cml.run() which generates a dictionary with a set of values I am interested in after running the analysis. The wiki claims I can find the values I need in a key called 'parameters'. But this only returns one of the values I need t= 0.2173, look:

>>> results.values()
['Fcodon', 'One dN/dS ratio for branches, ', '4.7b', {0: {'description': 'one-ratio', 'parameters': {'t': 0.1982}}}, {'htlv': {}, 'stlv': {}}]

Notice, that my parameters key only contains the t= 0.2173 and has omitted S= 703.9 N= 1489.1 dN/dS= 0.2247 dN = 0.0344 dS = 0.1529

Could anybody with codeml experience explain to me why the parser fails to yield most of the parameters (values) I am interested in?

Extra details

  • Using Python2.7, PAML4.7a
  • Running on Windows 7
  • I will readily edit in any data or info you require to help me fix this
1

There are 1 best solutions below

1
On

Cross-posting from https://www.biostars.org/p/89848/:

You can obtain omega values for each branch using the following commands:

from Bio.Phylo.PAML import codeml

results = codeml.read(paml_outputfile)

print results["NSsites"][0]["parameters"]["omega"]

This gives you a list of omega (dn/ds) for each branch

using version 4.7b and biopython-1.6.3