I'm trying to download DNA sequence data from NCBI using entrez_fetch
. With the following code, I perform a search for the IDs of the sequences I need with entrez_search
, and then I attempt to download the sequence data in FASTA format:
library(rentrez)
#Search for sequence ids
search <- entrez_search(db = "biosample",
term = "Escherichia coli[Organism] AND geo_loc_name=USA:WA[attr]",
retmax = 9999, use_history = T)
search$ids
length(search$ids)
search$web_history
#Download sequence data
ecoli_fasta <- entrez_fetch(db = "nuccore",
web_history = search$web_history,
rettype = "fasta")
When I do this, I get the following error:
Error: HTTP failure: 400
Cannot+retrieve+query+from+history
I don't understand what this means and Googling hasn't led me to an answer.
I tried using a different package (ape
) and the function read.GenBank
to download the sequences as an alternative, but this method only managed to download about 1000 of the 12000 sequences I needed. I would like the use entrez_fetch
if possible - does anyone have any insight for me?
This may be a starter.
Also be aware that queries to genome databases can return massive amounts of data, so be sure to limit your queries.
Build search web history
Use web history to fetch data
Use a loop to cycle through sequences, e.g