I have around 7.000 sentences, for which I have done a refined Name-Entity-Recognition (i.e., for specific entities) using SpaCy. Now I want to do relationship extraction (basically causal inference) and I do not know how to use NER to provide training set.
As far as I read there are a different approaches to perform relationship extraction:
- 1) Handwritten patterns
- 2) Supervised machine learning
- 3) Semi-supervised machine learning
Since I want to use supervised machine learning I need training data.
It would be nice if anyone could give me some direction, many thanks. Here is a screen shoot of my data frame, entities are provided by a customised spaCy model. I have access to the syntactic dependencies and part-of-speech tags of each sentence, as given by spaCy:
It seems that your dataset is some kind of technical writing, very well structured, so maybe part-of-speech tags are enough to do the extraction you want.
I would recommend you to read this paper, and understand the pos-tags based pattern used Identifying Relations for Open Information Extraction
The piece of code below tags a sent with part-of-speech tags and then looks for sequences that match the called ReVerb pattern.
There is a bit missing which is to find the closest noun-phrases to right and left of the pattern, but I leave that as an exercise. I also wrote a blog post with a more detailed example. I hope it helps.