I want to do Taxonomy Extraction of a raw large corpus with lots of abbreviations in text.
There is an R package called taxize. This package allows users to search over many taxonomic data sources for species names.
library('taxize')
#Get immediate children of Salmo
children("Salmo", db = 'ncbi')
#> $Salmo
#> childtaxa_id childtaxa_name childtaxa_rank
#> 1 1509524 Salmo marmoratus x Salmo trutta species
#> 2 1484545 Salmo cf. cenerinus BOLD:AAB3872 species
#
# Get synonyms
synonyms("Acer drummondii", db="itis")
My question here: is it possible to use taxize (or any alternative package) for taxonomy extraction of a text data given lots of abbreviations in text? For example how can I found immediate children of a specific abbreviation or concept which is a frequent word in my text data but not listed in taxonomic data sources such as "ncbi" and "itis".
Appreciate your comments and answers.
Thanks, Sam