custom feature function with (python) crfsuite

4k Views Asked by At

I have only read theorey about CRF so far and want to use python crfsuite in my master thesis for extracting ingredients from recipes. Every help is appreciated.

As far as I understand, I can provide training data to crfsuite in the form of the picture below, where w[0] provides the identity of the current word, w[i] the world relative to i and pos[i] its part-of-speech-tag relative to i.

training data format

And then crfsuite trains its own feature functions build on the given attributes.

But I can't find a way for providing custom feature functions like "w[i] is in a dictionary" (for example a dictionary of recipe ingredients) or "in the sentence is a negation" (for example "not", or "don't").

In general good tutorials are appreciated, because the manuals (https://python-crfsuite.readthedocs.io/en/latest/ or http://www.chokkan.org/software/crfsuite/manual.html) are not beginner-friendly from my point of view

1

There are 1 best solutions below

1
On

With python-crfsuite (or sklearn-crfsuite) training data doesn't have to be in the form you've described; a single training sequence should be a list of {"feature_name": <feature_value>"} dicts, with features for each sequence elements (e.g. for a token in a sentence). Features don't have to be words or POS tags. There is a few other supported feature formats (see http://python-crfsuite.readthedocs.io/en/latest).

For a more complete example check https://github.com/TeamHG-Memex/sklearn-crfsuite/blob/master/docs/CoNLL2002.ipynb - it uses custom features.