I tried out an example of parsing given on the Spacy website https://spacy.io/docs/usage/dependency-parse, but my results are not the same as that demonstrated on the website, and my results appear incorrect. I am using spacy version 1.9.0, model en_core_web_md and python version 3.5.2. The example is reproduced below:
from spacy.symbols import nsubj
doc = nlp(u'Credit and mortgage account holders must submit their requests.')
holders = doc[4]
span = doc[holders.left_edge.i : holders.right_edge.i + 1]
span.merge()
The output of span.merge() is
holders
Then continuing with the example:
for word in doc:
print(word.text, word.pos_, word.dep_, word.head.text)
And the output is
Credit NOUN npadvmod submit
and CCONJ cc Credit
mortgage NOUN compound account
account NOUN conj Credit
holders NOUN nsubj submit
must VERB aux submit
submit VERB ROOT submit
their ADJ poss requests
requests NOUN dobj submit
. PUNCT punct submit
However, the website demonstrates a different output:
# Credit and mortgage account holders nsubj NOUN submit
# must VERB aux submit
# submit VERB ROOT submit
# their DET det requests
# requests NOUN dobj submit
In the expected result, holders.lefts.i and holders.rights.i are not identical so that span.merge gives us a phrase
Further, I print the noun-chunks of the original Doc object:
doc = nlp(u'Credit and mortgage account holders must submit their requests.')
for nchunk in list(doc.noun_chunks):
print(nchunk)
which gives
holders
their requests
I am brand-new to spacy and NLP. Please excuse me if I missed something obvious.
You see the difference because you are using model en_core_web_md for Parsing. On their website, they have used default English model (en_core_web_sm) for dependency parse example. Look at this link for detailed list of models.
You can make an informed choice about which model to use by referring to SpaCy model releases. This link provides accuracy results on test corpus for various models and NLP tasks.