Syntaxnet POS tagger use of capitalization

128 Views Asked by At

I want to use Syntaxnet for getting the POS tags of tweets (and more specifically, extracting named entities from the text). However, Parsey McParseface is case-sensitive by default. Since tweets often not use capitalization, I was thinking of using a case-less tagger. I found something about capitalization in the code, but I was not sure if and how to use it:

https://github.com/dsindex/syntaxnet/blob/15831789a706cbc482efeeec635a8f0315d0b3fb/English/context.pbtxt

Let me give an example to be more clear. Consider the example sentences John gave the money to Maria and john gave the money to maria (with case and without case):

With caps:

gave VBD ROOT
 +-- John NNP nsubj
 +-- money NN dobj
 |   +-- the DT det
 +-- to IN prep
     +-- Maria NNP pobj

Without caps:

gave VBD ROOT
 +-- john NNP nsubj
 +-- money NN dobj
 |   +-- the DT det
 +-- to TO prep
     +-- maria NN pobj

As you can see, Maria is a NNP, whereas maria (without caps) is NN. When extracting named entities, it makes a difference if a word is tagged as NN or as NNP.

Is there a way to improve this? Is there a case-less Parsey McParseface for Syntaxnet?

0

There are 0 best solutions below