Which coreference chains does each sentence relate to in NeuralCoref?

435 Views Asked by At

I am using neuralcoref for the task of coreference resolution in a text.

I want to know each sentence has mentions from which coreference clusters. For example, sentence1 has mentions from coreference clusters 1, and 4; sentence 2 has mentions from coreference clusters 10 , 14.

How can I do this?

1

There are 1 best solutions below

0
On

You can try going though words in each sentence and populate a dictionary of sentence -> clusters if that word is part of a cluster. It assumes the span to be a single word though, which you can try extending to multiple words (bi-grams or tri-grams), in case you want to handle clusters where keys are multi-word.

import spacy
import neuralcoref

nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)

doc = nlp('Angela lives in Boston. She is Happy. Nikki is her new friend. She is jolly too.')
print('*** cluster : tokens mapping ***')
print(doc._.coref_clusters)

mapping = {}
for sent in doc.sents:
    mapping[sent] = set()

    for idx in range(1, len(sent)):
        span = sent[idx-1:idx]    # edit this to handle n-grams
        if span._.is_coref:        
            key = span._.coref_cluster.main               
            mapping[sent].add(key) 

  
    
print('*** sentence : clusters mapping ***')
print(mapping)  

The output looks like this:

*** cluster : tokens mapping ***
[Angela: [Angela, She, her], Nikki: [Nikki, She]]

*** sentence : clusters mapping ***
{Angela lives in Boston.: {Angela}, She is Happy.: {Angela}, Nikki is her new friend.: {Nikki, Angela}, She is jolly too.: {Nikki}}