Sentence chunking and dependency in Python

53 Views Asked by At

I am using conditional random fields of Python's crf-suite to custom tag tokens in sentences, which works quite well.

"Due to health concerns, Donald stopped eating sweets and ate one apple and two pears."

Every token of the sentence above would now also receive a tag to indicate its function in terms of "Motivation (M)", "Actor (A)", "Action (V)", "Object (P)", or "none (O)".

Token = ["Due", "to", "health", "concerns", ",", "Donald", "stopped", "eating", "sweets", "and", "ate", "one", "apple", "and", "two" "pears", "."]
Tag = ["M", "M", "M", "M", "O", "A", "V", "V", "P", "O", "V", "P", "P", "O", "P" "P", "O"]

While clustering them by function is comparatively easy, what is the best option to add dependency in a way that I can get the following table in the end:

Motivation = ["Due to health concerns", "Due to health concerns", "Due to health concerns"]
Actor = ["Donald", "Donald", "Donald"]
Action = ["stopped eating", "ate" "ate"]
Object = ["sweets", "one apple", "two pears"]

I looked into chunking with spacy but haven't been that successful - especially due to the need to heavily customize tagging.

0

There are 0 best solutions below