all
I have a NER model that was fine-tuned based on BERT on sentences using the BIO annotation framework.
Here is an example of my train data.
"['The', 'raw', 'datafiles', 'used', 'in', 'this', 'study', 'were', 'obtained', 'from', 'the', 'EMBL', '-', 'EBI', 'ArrayExpress', '[', '70', ']', ',', 'or', 'NCBI', 'Gene', 'Expression', 'Omnibus', '(', 'GEO', ')', '[', '71', ']', 'websites', '.']",
"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-Operation', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']"
My prediction results come in token level in BIO format. However, there are some issues with the output:
- a mixture of different BIO tags. below there is a
Meanstag inserted into theDatatag
the (B-Data) mo (B-Means) fa (I-Means)2 (I-Means)model (I-Data)
- a prediction result starts with the
I-tag rather thanB-tag
blinded scoring (I-Operation) of
I wonder why this is happening and how to resolve this. Thanks.