I am new to accessing Entrez through Biopython and a couple of R packages (rentrez and reutil). When accessing the 'nuccore' database with esummary, the output fields returned by Biopython are different than that returned by the R packages.
Python:
handle = Entrez.esearch(db='nuccore', term='183844[GPRJ]', retmax=75000)
record = Entrez.read(handle)
id_list = record["IdList"]
search_results = Entrez.read(Entrez.epost("nuccore", id=",".join(id_list), restart=1, retmax=10000))
webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]
handle1 = Entrez.esummary(db="nuccore", query_key=query_key, WebEnv=webenv)
record1 = Entrez.read(handle1)
The fields returned by Biopython are:
['AccessionVersion','Caption','Comment','CreateDate','Extra','Flags','Gi','Id', 'Item','Length','ReplacedBy','Status','TaxId','Title','UpdateDate']
R (reutil package):
trak <- esearch('183844[GPRJ]', "nuccore", usehistory=TRUE, retmax = 70000)
query_key <- 1
web_env <- "NCID_1_224566406_130.14.18.34_9001_1496371219_1582367639_0MetA0_S_MegaStore_F_1"
esum <- esummary(db="nuccore", querykey = query_key, webenv = web_env, retstart = 1, retmax = 10000)
gtrkr <- content(esum, "parsed")
While the fields returned by R packages reutil and rentrez are: esummary result with 31 items:
['uid', 'caption', 'title', 'extra', 'gi', 'createdate', 'updatedate', 'flags', 'taxid', 'slen', 'biomol', 'moltype', 'topology', 'sourcedb', 'segsetsize', 'projectid', 'genome', 'subtype', 'subname', 'assemblygi', 'assemblyacc', 'tech', 'completeness', 'geneticcode', 'strand', 'organism', 'strain', 'biosample', 'statistics', 'properties', 'oslt']
Thanks in advance.
Coming to this late, but as a past contributor to
biopython
and maintainer ofrentrez
I feel I need to explain what is going on here.Biopython is accessing "version 1.0" esummary records by default, and the R packages are fetching "version 2.0" records. There is a brief discussion about the differences between these records in the rentrez help page:
And just to demonstrate changing this argument reproduces the results from Biopython.
Edit -- getting version 2.0 records with Biopython
.