I have only read theorey about CRF so far and want to use python crfsuite in my master thesis for extracting ingredients from recipes. Every help is appreciated.
As far as I understand, I can provide training data to crfsuite in the form of the picture below, where w[0] provides the identity of the current word, w[i] the world relative to i and pos[i] its part-of-speech-tag relative to i.
And then crfsuite trains its own feature functions build on the given attributes.
But I can't find a way for providing custom feature functions like "w[i] is in a dictionary" (for example a dictionary of recipe ingredients) or "in the sentence is a negation" (for example "not", or "don't").
In general good tutorials are appreciated, because the manuals (https://python-crfsuite.readthedocs.io/en/latest/ or http://www.chokkan.org/software/crfsuite/manual.html) are not beginner-friendly from my point of view
With python-crfsuite (or sklearn-crfsuite) training data doesn't have to be in the form you've described; a single training sequence should be a list of
{"feature_name": <feature_value>"}
dicts, with features for each sequence elements (e.g. for a token in a sentence). Features don't have to be words or POS tags. There is a few other supported feature formats (see http://python-crfsuite.readthedocs.io/en/latest).For a more complete example check https://github.com/TeamHG-Memex/sklearn-crfsuite/blob/master/docs/CoNLL2002.ipynb - it uses custom features.