I have a phrase like this one:
Apple iPhone 5 White 16GB
and I want to tag in this way
B M C S
where B=Brand (Apple) M=Model (iPhone 5) C=Color (White) S=Size (Size)
A classifier must learn the sequence pattern... I think that I will use SVM or CRF.
My question is what is the best way to tag a phrase like this? I will use the NLTK library for python.
What you think of {Apple}\B {iPhone 5}\M....
? What is the best way?
Is there also a way to use a seed dictionary (of brands for example) to let NLTK automatic tagging a list of phrases for me?