I'm working on a NER task using flair. I noticed that sometimes flair introduces empty spaces after processing a sentence.
Example: the input sentence Herman Melvilles email is [email protected] ;-)
gives as output [PERSON_NAME] email is mobydick123 @ gmail.com ;-)
instead of [PERSON_NAME] email is [email protected] ;-)
.
How can I fix that?
from flair.data import Sentence
from flair.models import SequenceTagger
import re
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
line = 'Herman Melvilles email is [email protected] ;-)'
sentence = Sentence(line)
tagger.predict(sentence)
ii = 0
sentence1 = sentence
if len(sentence.get_spans('ner')) > 0:
for entry in sentence.get_spans('ner'):
if 'PERSON' in str(entry):
person = re.findall('"([^"]*)"', str(entry))
sentence1 = str(sentence1).replace(str(person[0]), "[PERSON_NAME]")
# Return the output sequence
try:
sentence1 = re.findall('"([^"]*)"', sentence1)[0]
except:
sentence1 = line
else:
sentence1 = line
print(sentence1)