Parsey McParseface incorrectly identifying root on questions

Question

Parsey McParseface incorrectly identifying root on questions

268 Views Asked by Phil At 13 May 2025 at 16:23

It seems to me that Parsey has severe issues with correctly tagging questions and any sentence with "is" in it.

Text: Is Barrack Obama from Hawaii?

GCloud Tokens (correct):

Is - [root] VERB
Barrack - [nn] NOUN
Obama - [nsubj] NOUN
from - [adp] PREP
Hawaii - [pobj] NOUN

Parsey Tokens (wrong):

Is - [cop] VERB
Barrack - [nsubj] NOUN
Obama - [root] NOUN
from - [adp] PREP
Hawaii - [pobj] NOUN

Parsey decides to make the noun (!) Obama the root, which messes up everything else.

Text: My name is Philipp

GCloud Tokens (correct):

My [poss] PRON
name [nsubj] NOUN
is [root] VERB
Philipp [attr] NOUN

ParseyTokens (incorrect):

My [poss] PRON
name [nsubj] NOUN
is [cop] VERB
Philipp [root] NOUN

Again parsey chooses the NOUN as root and struggles with COP.

Any ideas why this is happening and how I could fix it?

Thanks, Phil

Original Q&A

There are 3 best solutions below

**Jason** · Answer 1

I have to qualify my answer: I have limited knowledge of Parsey McParseface. However, since nobody else has answered, I hope I can add some value.

I think a major problem with most machine learning models is a lack of interpretability. This relates to your first question: "why is this happening?" It's very difficult to tell because this tool is founded on a 'black box' model, namely, a neural network. I will say that it seems extremely surprising, given the strong claims made about Parsey, that a common word like 'is' fools it consistently. Is it possible you've made some mistake? It's hard to tell without a code sample.

I'll assume you haven't made a mistake, in which case, I think you could solve this (or mitigate it) by taking advantage of your observation that the word 'is' seems to throw the model off. You could simply check the sentence in question for the word 'is' and use GCloud (or another parser) in that case. Conveniently, once you are using both, you can use GCloud as a fallback for other cases where Parsey seems to fail, should you find them in the future.

As for improving the base model, if you care enough, you could recreate it using the original paper, and perhaps optimize the training to suit your situation.

**calberti** · Answer 2

Regarding the first example, it appears that Parsey's training data is quite old, and doesn't contain any mention of even the word "Barack". If you replace Barack Obama with Bill Clinton you get a correct parse.

Input: Is Bill Clinton from Hawaii ? Parse: Is VBZ ROOT +-- Clinton NNP nsubj | +-- Bill NNP nn +-- from IN prep | +-- Hawaii NNP pobj +-- ? . punct

The second example is instead correctly parsed according to Stanford Dependencies (see "The treatment of copula verbs" in http://nlp.stanford.edu/software/dependencies_manual.pdf).

Input: My name is Philip Parse: Philip NNP ROOT +-- name NN nsubj | +-- My PRP$ poss +-- is VBZ cop

**Michael Covington** · Answer 3

Since it correctly tagged Barack Obama as 2 nouns, I don't think its unfamiliarity with the name is the problem. I think Parsey has a ban on using "is" as the root.

In theoretical dependency grammar, a noun is never the root of a complete sentence. Parsey, however, does not follow theory; it has a strong preference for making content words into heads. I am thinking that it has decided that when you say "X is Y" the head of the sentence should be "X" rather than "is" because "is" is not an informative word.

...Except for the Bill Clinton example, which may prove me wrong! I have not yet gotten Parsey working on my own computer, so I'm not sure.

Parsey McParseface incorrectly identifying root on questions

There are 3 best solutions below

Related Questions in NLP

Related Questions in TENSORFLOW

Related Questions in POS-TAGGER

Related Questions in DEPENDENCY-PARSING

Related Questions in PARSEY-MCPARSEFACE

Trending Questions

Popular # Hahtags

Popular Questions