Get the start position and end position of found named entities in xml

87 Views Asked by At

I'm new in xml parsing. I have a xml file which has a content and an identified entities (person and location ). Number of "person" entity in the file is close to 10 and "location" is just 3.

<em>
Mad Max:
<location>Fury Road</location 
</em>

and so on ..

I wanted to extract the content and start position and end position of each of the entities present in the xml file (using Python - for loop). But not sure how to start writing code to get the positions of it from the xml file.

Can someone please help me?

1

There are 1 best solutions below

2
bobtho'-' On

Instead of using regular for-loops (which could lead to problems in the future), you could use the builtin xml module in Python.

In your example:

import xml.etree.ElementTree as ET
tree = ET.parse(xmlfile)
root = tree.getroot()

From here you can get positions, or simply use this module instead of whatever you were planning to do with the xml data.