user defined feature in CRF++

573 Views Asked by At

I tried to add more feature to CRF++ template.

According to How can I tell CRF++ classifier that a word x is captilized or understanding punctuations?

training sample

The  DT  0  1   0   1   B-MISC
Oxford  NNP 0   1   0   1   I-MISC
Companion   NNP 0   1   0   1   I-MISC
to  TO  0   0   0   0   I-MISC
Philosophy  NNP 0   1   0   1   I-MISC

feature template

# Unigram
U00:%x[-2,0]
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-1,0]/%x[0,0]
U06:%x[0,0]/%x[1,0]
U07:%x[-2,0]/%x[-1,0]/%x[0,0]

#shape feature
U08:%x[-2,2]
U09:%x[-1,2]
U10:%x[0,2]
U11:%x[1,2]
U12:%x[2,2]

B

The traing phase is ok. But I get no ouput with crf_test

tilney@ubuntu:/data/wikipedia/en$ crf_test -m validation_model test.data
tilney@ubuntu:/data/wikipedia/en$ 

Everything works fine if ignore the shape fearture above. where did I go wrong?

1

There are 1 best solutions below

0
On BEST ANSWER

I figured this out. It's the problem with my test data. I thought that every feature should be taken from the trained model, so I only have two columns in my test data: word tag, which turns out that the test file should have the exact same format as the training data do!