Stanford NER is not properly extracting percentages

112 Views Asked by Khaleeque Ansari At 29 July 2025 at 07:03

I'm trying to extract percentages using Stanford NER. But it is not extracting percentage properly.

inp_str = 'total revenue received was one hundred and twenty five percent 125% for last financial year'
split_inp_str = inp_str.split()
st = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz')
print(st.tag(split_inp_str))

This gives following output

[('total', 'O'), ('revenue', 'O'), ('received', 'O'), ('was', 'O'), ('one', 'O'), ('hundred', 'O'), ('and', 'O'), ('twenty', 'O'), ('five', 'PERCENT'), ('percent', 'PERCENT'), ('125%', 'O'), ('for', 'O'), ('last', 'O'), ('financial', 'O'), ('year', 'O')]

Why is it not extracting 125% or one hundred and twenty five percent?

Original Q&A

There are 1 best solutions below

Anish On 12 January 2017 at 00:13

You need to tokenize the sentence rather than split(). Try the following code.

from nltk import word_tokenize

inp_str = 'total revenue received was one hundred and twenty five percent 125% for last financial year'
split_inp_str = word_tokenize(inp_str)
st = StanfordNERTagger('english.muc.7class.distsim.crf.ser.gz')
print(st.tag(split_inp_str))

Stanford NER is not properly extracting percentages

There are 1 best solutions below

Related Questions in STANFORD-NLP

Related Questions in NAMED-ENTITY-RECOGNITION

Trending Questions

Popular # Hahtags

Popular Questions