Rentrez pubmed search - order by descending publication date

50 Views Asked by At

Using the Rentrez package in R, I want to search a list of drugs and find the date of the earliest publication mentioning each. My strategy is as follows:

# Search for pubmed IDs for a drug
drug_name <- "aspirin"
search_query <- paste0(drug_name, "[Title/Abstract]")
search_results <- entrez_search(db = "pubmed", term = search_query, sort = "pub_date", retmax = 1000)

# Get the oldest (first) article ID
oldest_article_id <- last(search_results$ids)

The problem here is that the function will only sort the results in ascending order (most recent first). One option would be to increase 'retmax' to return all of the results, and select the last value. However some of the drugs give more results than the maximum value of retmax.

The Rentrez documentation does not give any option for ascending results, though perhaps there is an undocumented way to do this through the API. Otherwise I will need to identify a totally different strategy such as scraping the web site.

1

There are 1 best solutions below

0
On

You need some coding here. This is an approximate approach.

# Search for pubmed IDs for a drug
drug_name <- "aspirin"
search_query <- paste0(drug_name, "[TITL]")
search_results <- entrez_search(db = "pubmed", term = search_query, 
                            sort = "pub_date", retmax = 9999)

You know there is more results than the maximum value of retmax because :

search_results$count #hits in NCBI Pubmed
length(search_results$ids) #hits in the R object 

Then, get the Entrez Date for the last element in the R object (= 2007).

search_summ = entrez_summary(db = "pubmed", id = tail(search_results$ids,1))
extract_from_esummary(search_summ, c("uid", "pubdate"), simplify = T)

`$uid
[1] "17431000"

$pubdate
[1] "2007 Apr"

Repeat the search until you get all available hits. (Some functions could be written here, so the search is run programmatically.).

# New search delimiting dates by range 
search_query = "aspirin[TITL] AND 1800:2007[EDAT]"
search_results <- entrez_search(db = "pubmed", term =  search_query, retmax = 9999)
search_summ = entrez_summary(db = "pubmed", id = tail(search_results$ids,1))
extract_from_esummary(search_summ, c("uid", "pubdate"), simplify = T)

$pubdate
[1] "1969 Jan"

# New search delimiting dates by range 
search_query = "aspirin[TITL] AND 1800:1969[EDAT]"
search_results <- entrez_search(db = "pubmed", term =  search_query, retmax = 9999)

#Got all the available entries ? 
identical(length(search_results$ids), search_results$count)
[1] TRUE

# Entrez Date for the last element
search_summ = entrez_summary(db = "pubmed", id = tail(search_results$ids,1))
extract_from_esummary(search_summ, c("uid", "pubdate"), simplify = T)

$pubdate
[1] "1902 Dec 27"