How do I provide my Identity / Email when connecting to NCBI through Rentrez?

104 Views Asked by At

My project head is telling me that its unacceptable to connect with NCBI to retrieve sequence entries without sending along identifying information such as our institution email. They claim this means NCBI won't instantly block our connection if we violate their user guidelines, they'll 'email' us first. We are using Rstudio with the Rentrez package to retrieve protein sequences from NCBI Genbank.

But I'm not certain that's necessary or IF rentrez has any way to even do that. For reference this is general format of our code.

sequence <- entrez_fetch(db="nuccore", id=**accession_number**, rettype="fasta")

Rentrez says on their documentation: "The NCBI will ban IPs that don't use EUtils within their user guidelines. In particular /enumerated /item Don't send more than three request per second (rentrez enforces this limit) /item If you plan on sending a sequence of more than ~100 requests, do so outside of peak times for the US /item For large requests use the web history method (see examples for entrez_search or use entrez_post to upload IDs)"

Both entrez_search and entrez_post include an argument called "web_history A web_history object for use in subsequent calls to NCBI" I'm not sure if this is what I'm looking for though.

I can't find any arguments or functions etc. which allow the user to send identifying information to NCBI when connecting.

1

There are 1 best solutions below

0
On

It seems like you need an API key. You can get one from your NCBI account interactively, and it needs to be specified in your .bash_profile (at least on a mac, using bash, not sure your OS / terminal of choice here).

For command line usage it just needs to be set as a variable with the following line added to your profile:

export NCBI_API_KEY=<yourkeyehere>

Then as long as R is loading up that profile when it spins up, you should be fine.

EDIT: A bit of a tangential note here, you can grab files from the FTP site with utilities like curl and wget, or even Biostrings' functions like readDNAStringSet() without an API key, but if you're going to access things with eutils, you need one - as long as you're going OVER the X-number of queries per second - but if you're under that threshold, i don't think they care that much.