How do I download a large number of GenBank sequences using entrez_fetch in R?

142 Views Asked by At

I am trying to download sequence data from 1283 records in GenBank using rentrez. I'm using the following code, first to search for records fitting my criteria, then linking across databases, and finally fetching the sequence data:

# Search for sequence ids in biosample database
search <- entrez_search(db = "biosample", 
                        term = "Escherichia coli[Organism] AND geo_loc_name=USA:WA[attr]",
                        retmax = 9999, use_history = T)

search$ids
length(search$ids)
search$web_history


#Link IDs across databases: biosample to nuccore (nucleotide sequences)
nuc_links <- entrez_link(dbfrom ="biosample", 
                         id = search$web_history, 
                         db ="nuccore", 
                         by_id = T)
nuc_links$links

#Fetch nucleotide sequences
fetch_ids1 <- entrez_fetch(db = "nucleotide",
                           id = nuc_links$links$biosample_nuccore,
                           rettype = "xml")

When I do this for one single record, I am able to get the data I need. When I try to scale it up to pull data for all the sequences I need using the web history of my search, it's not working. The nuc_links$links list is NULL, which is telling me that the entrez_link is not working how I hope it will. Can anyone show me where I'm going wrong?

0

There are 0 best solutions below