Dears,
I use pubmed.mineR: Text Mining of PubMed Abstracts, to extract gene symbols from PubMed Abstacts (texts).
There are some gene symbols like:
- can (https://www.uniprot.org/uniprotkb/P61517/entry)
- thiS (https://www.uniprot.org/uniprotkb/O32583/entry
that are also English words.
Are you aware of any smart method to discriminate between an abstract describing the gene "can" and an abstract where "can" is just an English verb?
Thanks you all in advance.
I tried simply word frequency, where I can get a list of "problematic" gene symbols