Unlabeled instances in DOCCANO and SpaCY. Do they offer any value?

209 Views Asked by yury_gurevich At 27 July 2025 at 12:28

I am using doccano for sequence labelling and spacy for further modeling. Some of the sentences I label do not contain any of the labels I am interested in, so they remain "unlabeled" ie. no tags.

{"id": 79, "data": "This powerful charm would protect him until he became of age, or no longer called his aunt's house home.", "label": []}
{"id": 82, "data": "He began attending Hogwarts School of Witchcraft and Wizardry in 1991.", "label": []}
{"id": 85, "data": "He later became the youngest Quidditch Seeker in over a century and eventually the captain of the Gryffindor House Quidditch Team in his sixth year, winning two Quidditch Cups.", "label": []}

I want to train SpaCy to recognise character names in all their variations.

Now the questions:

is there any value in including unlabeled instances for the purpose of training SpaCy model?
if there is then should I declare this data as "imbalanced dataset" and act accordingly? (boost? smote? over-sampling? etc.)
what are the best practice in cases like this?

Original Q&A

There are 1 best solutions below

polm23 On 12 June 2021 at 08:59

Yes, you need to include some examples where nothing is labelled so the model can learn what not to label. For example, if in all your sample sentences all capitalized words are labelled, the model might learn to always label capitalized words.

Unlabeled instances in DOCCANO and SpaCY. Do they offer any value?

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in NLP

Related Questions in SPACY

Related Questions in DOCCANO

Trending Questions

Popular # Hahtags

Popular Questions