I have been trying to find a program that can extract information from unstructured text(news articles, books, etc).
My eventual goal is to create a program that can take regular sentences and cache it in a database much like google does but without all its duplicate information.
lets take the NLTK example: "At eight o'clock on Thursday morning Arthur didn't feel very good."
the things that i would want extracted would be:
time: 8:00pm
date: thursday
person: Arthur
action: didn't feel good
is there a program that can do this?
i have tried using NLTK but i cant seem to find any good way to accomplish extracting the information.
This problem is called
Fine grained entity recognition
. No, There are no tools (except for research works) that can add such semantics.To start with, you can recognise Person and Time with appropriate models using Entity Recogniser.
You can recognise the actions from sentence parsing as suggested by @Junuxx.
Also give Wikify a try.
Thank you.