Is there a current model or how could I train a model that takes a sentence involving two subjects like:
[Meiosis] is a type of [cell division]...
and decides if one is the child or parent concept of the other? In this case, cell division is the parent of meiosis.
Are the subjects already identified, i.e., do you know beforehand for each sentence which words or sequence of words represent the subjects? If you do I think what you are looking for is relationship extraction.
Unsupervised approach
A simple unsupervised approach is to look for patterns using part-of-speech tags, e.g.:
First you tokenize and get the PoS-tags for each sentence:
Then you build a parser, to parse a specific pattern based on PoS-tags, which is a pattern that mediates relationships between two subjects/entities/nouns:
NOTE: This pattern is based on this paper: http://www.aclweb.org/anthology/D11-1142
You can then apply the parser to all the tokens/PoS-tags except the ones which are part of the subjects/entities:
If the the parser outputs a REL_PHRASE than there is a relationships between the two subjects. You then need to analyse all these patterns and decide which represent a
parent-of
relationships. One way to achieve that is by clustering them, for instance.Supervised approach
If your sentences already are tagged with subjects/entities and with the type of relationships, i.e., a supervised scenario than you can build a model where the features can be the words between the two subjects/entities and the type of relationship the label.
You can build a vector representation of
is a type of
, and train a classifier to predict the labelparent of
. You will need many examples for this, it also depends on how many different classes/labels you have.