Dealing with missing or unknown features when tagging items using CRF model (CRFSuite)

404 Views Asked by Avolith At 17 August 2025 at 17:40

I'm using CRFSuite (the python-crfsuite implementation) to build a named-entity-extractor, similar to the tutorial on http://nbviewer.ipython.org/github/tpeng/python-crfsuite/blob/master/examples/CoNLL%202002.ipynb The training input is a sequence of words, each of which has a number of features.

The problem is that for my specific use-case, I don't always have the features of the entities that I'm trying to recognise. I want the CRF model to recognise the entity based on the features of the surrounding words. However, when I simply input an empty dict {} as a word's features, the named entities are never properly classified as such.

I'm wondering if there is a feature or standard method to handle such cases, where after training a model, one does not always have features for all items.

Original Q&A

There are 1 best solutions below

mhbashari On 13 July 2015 at 14:49

Assigning fixed value for missing features like "-" or "+" can be a useful in some cases.

Dealing with missing or unknown features when tagging items using CRF model (CRFSuite)

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MISSING-DATA

Related Questions in CRF

Related Questions in MISSING-FEATURES

Trending Questions

Popular # Hahtags

Popular Questions