It seems to me that Parsey has severe issues with correctly tagging questions and any sentence with "is" in it.
Text: Is Barrack Obama from Hawaii?
GCloud Tokens (correct):
- Is - [root] VERB
- Barrack - [nn] NOUN
- Obama - [nsubj] NOUN
- from - [adp] PREP
- Hawaii - [pobj] NOUN
Parsey Tokens (wrong):
- Is - [cop] VERB
- Barrack - [nsubj] NOUN
- Obama - [root] NOUN
- from - [adp] PREP
- Hawaii - [pobj] NOUN
Parsey decides to make the noun (!) Obama the root, which messes up everything else.
Text: My name is Philipp
GCloud Tokens (correct):
- My [poss] PRON
- name [nsubj] NOUN
- is [root] VERB
- Philipp [attr] NOUN
ParseyTokens (incorrect):
- My [poss] PRON
- name [nsubj] NOUN
- is [cop] VERB
- Philipp [root] NOUN
Again parsey chooses the NOUN as root and struggles with COP.
Any ideas why this is happening and how I could fix it?
Thanks, Phil
I have to qualify my answer: I have limited knowledge of Parsey McParseface. However, since nobody else has answered, I hope I can add some value.
I think a major problem with most machine learning models is a lack of interpretability. This relates to your first question: "why is this happening?" It's very difficult to tell because this tool is founded on a 'black box' model, namely, a neural network. I will say that it seems extremely surprising, given the strong claims made about Parsey, that a common word like 'is' fools it consistently. Is it possible you've made some mistake? It's hard to tell without a code sample.
I'll assume you haven't made a mistake, in which case, I think you could solve this (or mitigate it) by taking advantage of your observation that the word 'is' seems to throw the model off. You could simply check the sentence in question for the word 'is' and use GCloud (or another parser) in that case. Conveniently, once you are using both, you can use GCloud as a fallback for other cases where Parsey seems to fail, should you find them in the future.
As for improving the base model, if you care enough, you could recreate it using the original paper, and perhaps optimize the training to suit your situation.