I am new to accessing Entrez through Biopython and a couple of R packages (rentrez and reutil). When accessing the 'nuccore' database with esummary, the output fields returned by Biopython are different than that returned by the R packages.
Python:
handle = Entrez.esearch(db='nuccore', term='183844[GPRJ]', retmax=75000)
record = Entrez.read(handle)
id_list = record["IdList"]
search_results = Entrez.read(Entrez.epost("nuccore", id=",".join(id_list), restart=1, retmax=10000))
webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]
handle1 = Entrez.esummary(db="nuccore", query_key=query_key, WebEnv=webenv)
record1 = Entrez.read(handle1)
The fields returned by Biopython are:
['AccessionVersion','Caption','Comment','CreateDate','Extra','Flags','Gi','Id', 'Item','Length','ReplacedBy','Status','TaxId','Title','UpdateDate']
R (reutil package):
trak <- esearch('183844[GPRJ]', "nuccore", usehistory=TRUE, retmax = 70000)
query_key <- 1
web_env <- "NCID_1_224566406_130.14.18.34_9001_1496371219_1582367639_0MetA0_S_MegaStore_F_1"
esum <- esummary(db="nuccore", querykey = query_key, webenv = web_env, retstart = 1, retmax = 10000)
gtrkr <- content(esum, "parsed")
While the fields returned by R packages reutil and rentrez are: esummary result with 31 items:
['uid', 'caption', 'title', 'extra', 'gi', 'createdate', 'updatedate', 'flags', 'taxid', 'slen', 'biomol', 'moltype', 'topology', 'sourcedb', 'segsetsize', 'projectid', 'genome', 'subtype', 'subname', 'assemblygi', 'assemblyacc', 'tech', 'completeness', 'geneticcode', 'strand', 'organism', 'strain', 'biosample', 'statistics', 'properties', 'oslt']
Thanks in advance.
To explain the Biopython example:
Now this should confirm their are 1000 entries (matching
retmax), and each has 15 fields:That should give:
As an aside, I'm not sure what the
'Item'empty list is from.Let's check the actual raw XML for the first record using retmax=1
This gives:
i.e. The exact same fields Biopython's Entrez parser is giving you as keys (plus the
Idand thatItemempty list which puzzled me above).Are you sure you are comparing like with like here?
Could you give a specific example accession where your R solution has extra data?