I created classifier to classy the class of nouns,adjectives, Named entities in given sentence. I have used large Wikipedia dataset for classification.
Like :
Where Abraham Lincoln was born?
So classifier will give this short of result - word - class
- Where - question
- Abraham Lincoln - Person, Movie, Book (because classifier find Abraham Lincoln in all there categories)
- born - time
When Titanic was released?
- when - question
- Titanic - Song, movie, Vehicle, Game (Titanic classified in all these categories)
Is there any way to identify exact context for word?
Please see :
- Word sense disambiguation would not help here. Because there might not be near by word in sentence which can help
Lesk algorithm with wordnet or sysnet also does not help. Because it for suppose word
Bank
lesk algo will behave like this======== TESTING simple_lesk ===========
TESTING simple_lesk() ...
Context: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition: a financial institution that accepts deposits and channels the money into lending activities
TESTING simple_lesk() with POS ...
Context: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)
Here for word bank
it suggested as financial institute
and slopping land
. While in my case I am already getting such prediction like Titanic
then it can be movie
or game
.
I want to know is there any other approach apart from Lesk algo
, baseline algo
, traditional word sense disambiguation
which can help me to identify which class is correct for particular keyword?
Titanic -
Thanks for using the
pywsd
examples. With regards to wsd, there are many other variants and i'm coding them by myself during my free time. So if you want to see it improve do join me in coding the open source tool =)Meanwhile, you will find the following technologies more relevant to your task, such as:
Knowledge base population (http://www.nist.gov/tac/2014/KBP/) where tokens/segments of text are assigned an entity and the task is to link them or to solve a simplified question and answer task.
Knowledge representation (http://groups.csail.mit.edu/medg/ftp/psz/k-rep.html)
Knowledge extraction (https://en.wikipedia.org/wiki/Knowledge_extraction)
The above technologies usually includes several sub-tasks such as:
Essentially you're asking for a tool that is an NP-complete AI system for language/text processing, so I don't really think such a tool exists as of yet. Maybe it's IBM Watson.
if you're looking for the field to look into, the field is out there but if you're looking at tools, most probably wikification tools are closest to what you might need. (http://nlp.cs.rpi.edu/paper/WikificationProposal.pdf)