Taxonomy Extraction of a Text Data in R

279 Views Asked by At

I want to do Taxonomy Extraction of a raw large corpus with lots of abbreviations in text.

There is an R package called taxize. This package allows users to search over many taxonomic data sources for species names.

library('taxize')

#Get immediate children of Salmo
children("Salmo", db = 'ncbi')

#> $Salmo
#>    childtaxa_id                   childtaxa_name childtaxa_rank
#> 1       1509524  Salmo marmoratus x Salmo trutta        species
#> 2       1484545 Salmo cf. cenerinus BOLD:AAB3872        species
# 

# Get synonyms
synonyms("Acer drummondii", db="itis")

My question here: is it possible to use taxize (or any alternative package) for taxonomy extraction of a text data given lots of abbreviations in text? For example how can I found immediate children of a specific abbreviation or concept which is a frequent word in my text data but not listed in taxonomic data sources such as "ncbi" and "itis".

Appreciate your comments and answers.

Thanks, Sam

0

There are 0 best solutions below