I am working on extracting entities from scientific text (I am using scispacy) and later I will want to extract relations using hand-written rules. I have extracted entities and their character span successfully, and I can also get the pos and dependency tags for tokens and noun chunks. So I am comfortable with the two tasks separately, but I want to bring the two together and I have been stuck for a while.
The idea is that I want to be able to write rules such as: (just an example) if in a sentence/clause there are two entities where the first one is a 'DRUG/CHEMICAL' + is the subject, and the second one is a 'DISEASE' + is an object --> (then) infer 'treatment' relation between the two.
If anyone has any hints on how to approach this task, I would really appreciate it. Thank you!
S.
What I am doing to extract entities:
doc = nlp(text-with-more-than-one-sent)
for ent in doc.ents:
`... (get information about the ent e.g. its character span)`
Getting dependency information (for noun chunks and for tokens):
for chunk in doc.noun_chunks:
print(f"Text: {chunk.text}, Root text: {chunk.root.text}, Root dep: {chunk.root.dep_}, Root head text: {chunk.root.head.text}, POS: {chunk.root.head.pos_}")
_
for token in doc:
print(f"Text: {token.text}, DEP label: {token.dep_}, Head text: {token.head.text}, Head POS: {token.head.pos_}, Children: {[child for child in token.children]}")
You can use the
merge_entities
mini-component to convert entities to single tokens, which would simplify what you're trying to do. There's also a component to merge noun chunks similarly.