I have a list of sentences, e.g. sentences = ["Mary likes Facebook", "Chris likes Whatsapp"]
I want to create a list of dictionaries that extracts entities and their types from all of these sentences. For example:
[
{'entity': 'Mary', 'type':'PERS'},
{'entity': 'Facebook', 'type':'ORG'},
{'entity': 'Chris', 'type':'PERS'},
{'entity': 'Whatsapp', 'type':'ORG'}
]
At the moment I'm using nested for loops to achieve this using Flair:
for sent in sentences:
for entity in sent.get_spans("ner"):
entity_list.append(
{
"entity": entity.text,
"type": entity.tag
}
)
Is there a way to optimise the above and reduce the time complexity?
I don't think you can get away from the nested loops reduce/time complexity, but perhaps you can use
multiprocessing
to speed up wall time by parallelizing the NER tagging of the sentences?