NLP: Resolve coreference pronoun in blocks

326 Views Asked by At

I'm planning on executing my NLP pipeline on a corpus of books. Since resolving coreferences is an intensive process, I wouldn't be able to process an entire book or maybe even an entire chapter at a time. I was planning on splitting the text into sizeable chunks to resolve coreferences.

The issue I need help with is how would I resolve pronouns from Group2 when the noun that they're referencing is located in Group1. Is there a way to seed the dependencies from Group1 to the following groups? If not, how is this typically handled?

For what it's worth I'm using CoreNLP, but I'm open to other others.

"Group 1": George was born in New York. George is 10.

"Group 2": He loves New York city.

1

There are 1 best solutions below

0
On

This may be interesting to read: https://stanfordnlp.github.io/CoreNLP/memory-time.html And here https://stanfordnlp.github.io/CoreNLP/coref.html they mention the maxMentionDistance setting. I remember modifying that at some point when I used coreNLP for coref resolution. (But in Java directly; since you've tagged your question with NLTK; not sure if setting this is also possible in the NLTK implementation)

I'd use common sense here and try to stick to conceptual blocks as much as possible, i.e. if chapters are too big, try (a couple of) paragraphs. Perhaps you could 'glue' the mention chains back together in post-processing, but I guess that would not be immediately straightforward.