I have a text like :
Take a loot at some of the first confirmed Forum speakers: John Sequiera Graduated in Biology at Facultad de Ciencias Exactas y Naturales,University of Buenos Aires, Argentina. In 2004 obtained a PhD in Biology (Molecular Neuroscience), at University of Buenos Aires, mentored by Prof. Marcelo Rubinstein. Between 2005 and 2008 pursued postdoctoral training at Pasteur Institute (Paris) mentored by Prof Jean-Pierre Changeux, to investigate the role of nicotinic receptors in executive behaviors. Motivated by a deep interest in investigating human neurological diseases, in 2009 joined the Institute of Psychiatry at King’s College London where she performed basic research with a translational perspective in the field of neurodegeneration. Since 2016 has been chief of instructors / Adjunct professor at University of Buenos Aires, Facultad de Ciencias Exactas y Naturales. Tom Gonzalez is a professor of Neuroscience at the Sussex Neuroscience, School of Life Sciences, University of Sussex. Prof. Baden studies how neurons and networks compute, using the beautiful collection of circuits that make up the vertebrate retina as a model.
I want to have in output :
[{"person" : "John Sequiera" , "content": "Graduated in Biology at Facultad...."},{"person" : "Tom Gonzalez" , "content": "is a professor of Neuroscience at the Sussex..."}]
so we want to get NER : PER for person and in content we put all contents after detecting person until we found a new person in the text ...
it is possible ?
i try to use spacy to extract NER , but i found a difficulty to get content :
import spacy
nlp = spacy.load("en_core_web_lg")
doc = nlp(text)
for ent in doc.ents:
print(ent.text,ent.label_)