Seven class classifier not giving desired results in StanfordNLP python

332 Views Asked by At

I am trying to use Stanford's named Entity Recognizer. I want to use the 7 class classifier because I even want to detect time(or date) and other things in a sentence. When entering the sentence:

"He was born on October 15, 1931 at Dhanushkothi in the temple town Rameshwaram in Tamil Nadu."

in the online demo at Stanford NLP site (http://nlp.stanford.edu:8080/ner/process) it is classifying correctly as can be seen in this image (the demo in the Stanford site for the above line):

The demo in the stanford site for the above line

But, when I'm trying the code to run on my system using NLTL and StanfordTagger, I am getting wrong result. I am getting the output as:

[(u'He', u'O'), (u'was', u'O'), (u'born', u'O'), (u'on', u'O'), (u'1931-10-15', u'O'), 
(u'at', u'O'), (u'Dhanushkothi', u'O'), (u'in', u'O'), (u'the', u'O'), 
(u'temple', u'O'), (u'town', u'O'), (u'Rameshwaram', u'O'), (u'in', u'O'), 
(u'Tamil', u'ORGANIZATION'), (u'Nadu', u'ORGANIZATION'), (u'.', u'O')]

It is identifying the date incorrectly here as 'other' and even Tamil Nadu as an organization instead of a location. The code I've used is here below:

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tag import StanfordNERTagger

st = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz','stanford-ner.jar')

i= "He was born on October 15, 1931 at Dhanushkothi in the temple town Rameshwaram in Tamil Nadu."

words = nltk.word_tokenize(i)
namedEnt = st.tag(words)

print namedEnt 

Can anyone please tell the mistake I'm doing (if any) or any other way to identify location and time in a sentence? I'm a beginner to NLP and any help regarding this would be appreciated.

1

There are 1 best solutions below

0
On

I tried to run your code and found some issues with word_tokenize.

Try this code:

from nltk import sent_tokenize, word_tokenize
from nltk.tag import StanfordNERTagger

st = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz','stanford-ner.jar')

i= "He was born on October 15, 1931 at Dhanushkothi in the temple town Rameshwaram in Tamil Nadu."

words = word_tokenize(i)
namedEnt = st.tag(words)

print namedEnt

Here is my output:

[(u'He', u'O'), (u'was', u'O'), (u'born', u'O'), (u'on', u'O'), (u'October', u'DATE'), (u'15', u'DATE'), (u',', u'DATE'), (u'1931', u'DATE'), (u'at', u'O'), (u'Dhanushkothi', u'O'), (u'in', u'O'), (u'the', u'O'), (u'temple', u'O'), (u'town', u'O'), (u'Rameshwaram', u'O'), (u'in', u'O'), (u'Tamil', u'ORGANIZATION'), (u'Nadu', u'ORGANIZATION'), (u'.', u'O')]