I'm very new in CRF and I want to use CRFsuite to tag words. I read CRFsuite's manual and understand the format of the training data, but if I want to add some features which have some tags of "near words", what's the training data file look like?
I have google around but I found nothing about this problem.
The short answer is that you supply attributes of the word coffee (like
w[-1]=drank
to indicate the previous word) and its label (NOUN
), and CRFsuite generates the actual indicator functions that compose the CRF model (including a feature that indicates that the label of the previous word isVERB
). It knows to do this because it uses a "1st-order Markov CRF with dyad features," as described on the manual page you linked to.One distinction that's important to make (and that the documentation could be more precise about) is the difference between "features" and "attributes" where features are links in the model that represent either (attribute, label) or (label, label) pairs.
So in your example,
w[-1]=drank
is an attribute that you supply. The combination ofw[-1]=drank, NOUN
is a state feature and the transition between labelsVERB --> NOUN
is a transition feature, both of which are generated by CRFsuite.I recommend the tutorial, which discusses this in more detail.