How-to issue: spaCy mentions that ELMo/BERT are very effective in NLP tasks if you have few data, as these two have very good transfer learning properties.
My question: transfer learning relative to what model. If you have a language model for dogs, finding a good language model for kangeroos is easier (my case is biology-related, and has a lot of terminology)?
Well, BERT and ELMo are trained on huge corpus(BERT is trained on 16GB of raw text) of data. This implies, that the embeddings produced from these models are generic, this would leverage the capabilities of a language model in most of the task.
Since your task is biology related, you can have look at alternatives such as BioBERT (https://arxiv.org/abs/1901.08746)