I'm trying to build a Chinese word segmentator as this paper. If I understand it correctly, they use a 2-tag segmentation approach with CRF++. My question is, how to make the tag transition in that paper (e.g.T(-1)C(0)T(0)) as a feature template in CRF++? For example,for such training data:
共 B
同 M
創 B
造 M
美 B <- Current word
好 M
的 B
新 B
世 B
紀 M
Is it possible to have feature T(-1)C(0)T(0) -> M/美/B in CRF++? I've tried add feature tempalte such as U01:%x[-1,1]/%x[0,0]/%x[0,1] but that failed. I am also confused about that since the B/I tag is the tag we want to tag in the testing data(e.g. raw Chinese sentences), why it's possible to use the tag as feature in the paper? Or I misunderstood anything?
Features like T(-1)C(0)T(0) -> M/美/B in CRF++ can be represented as:
B01:%x[0,0]
Note the difference. B, not U
if you use U01:%x[0,0], it means a feature like "美/B".
This also confused me a bit when I first use CRF++ 6 years ago. Hope this can help you.
I should mention that in CRF, the description of a feature will include the label. I mean, the following is a 0-1 feature: Current character is "美" and current label is "B"
What "template" in CRF++(which is a tool implemented CRF) does is to enumerate all labels given the context defined in the template.
So in your example, U01:%x[0,0] introduces 2 features automatically: "U01:美_y=B" and "U01:美_y=M"