How to use target label as feature in CRF++?

348 Views Asked by At

I'm trying to build a Chinese word segmentator as this paper. If I understand it correctly, they use a 2-tag segmentation approach with CRF++. My question is, how to make the tag transition in that paper (e.g.T(-1)C(0)T(0)) as a feature template in CRF++? For example,for such training data:

共 B

同 M

創 B

造 M

美 B <- Current word

好 M

的 B

新 B

世 B

紀 M

Is it possible to have feature T(-1)C(0)T(0) -> M/美/B in CRF++? I've tried add feature tempalte such as U01:%x[-1,1]/%x[0,0]/%x[0,1] but that failed. I am also confused about that since the B/I tag is the tag we want to tag in the testing data(e.g. raw Chinese sentences), why it's possible to use the tag as feature in the paper? Or I misunderstood anything?

1

There are 1 best solutions below

0
On

Features like T(-1)C(0)T(0) -> M/美/B in CRF++ can be represented as:

B01:%x[0,0]

Note the difference. B, not U

if you use U01:%x[0,0], it means a feature like "美/B".

This also confused me a bit when I first use CRF++ 6 years ago. Hope this can help you.


I should mention that in CRF, the description of a feature will include the label. I mean, the following is a 0-1 feature: Current character is "美" and current label is "B"

What "template" in CRF++(which is a tool implemented CRF) does is to enumerate all labels given the context defined in the template.

So in your example, U01:%x[0,0] introduces 2 features automatically: "U01:美_y=B" and "U01:美_y=M"