I have already labled a dataset using dataturks to train a spaCy
NER and everything works fine, however, I just realized that Flair
has a different format and I am just wondering if there is a way to convert my "spaCy's NER" json dataset format into the Flair
format:
George N B-PER
Washington N I-PER
went V O
to P O
Washington N B-LOC
However the spaCy's format will be as follow:
[("George Washington went to Washington",
{'entities': [(0, 6,'PER'),(7, 17,'PER'),(26, 36,'LOC')]})]
Flair
usesBILUO
scheme, with empty line between sentences, so you would need to usebliuo_tags_from_offsets
:Output:
Note, to train just
NER
this seem to be enough. If you wish to add pos tagging, you would need to create a mapping from Universal Pos Tags to Flair simplified scheme. For example:Output: