BERT NER detect multiple words as one entity

350 Views Asked by At

I' using bert to train custom ner model.i'm using simpletransformer pacakge. I have 2 custom entity - place, other

In dataset as for word column I have multiple words for particular label in row eg

Sentence_id |words |labels 17. |united states |place 17. |south Africa. |place Eg have sentence Hi I'm XYZ from United states

While predicting model is predicting output for each word. I want model to take 2 words for predicting ner. Eg instead of united it should use united states as entity

Is there any way or configuration that where we can pass numerical of words(n-grams) that model should take

1

There are 1 best solutions below

0
On

I'm not familiar with simpletransformers, but it looks like it only provides one label per token. What you can do in that case is label the first token of an entity B-[LABEL] and any following tokens I-[LABEL]. This is known as IOB Tagging.

It's kind of weird that you have to do that manually, though. For most NER systems that should be automatic. You can see an example of automatically handling multi-word entities in the spaCy course.