Parsing XML in Python

285 Views Asked by At

I have a large XML file and I need to format it to get some needed data from particular elements in it and print out only data needed into another file. In the XML file I have a number of text tags belonging to different conversations with id's and authors who have id's after the author tag. I do not need all the texts from all authors but the specific ones whom I have their id's. How do I write a function that specifies it to only select and write out conversations where author = id1 or id2 or id3.......etc? This is what the document looks like...

 <conversations>
  <conversation id="e621da5de598c9321a1d505ea95e6a2d">
    <message line="1">
      <author>97964e7a9e8eb9cf78f2e4d7b2ff34c7</author>
      <time>03:20</time>
      <text>Hola.</text>
    </message>
    <message line="2">
      <author>0158d0d6781fc4d493f243d4caa49747</author>
      <time>03:20</time>
      <text>hi.</text>
    </message>
  </conversation>
  <conversation id="3c517e43554b6431f932acc138eed57e">
    <message line="1">
      <author>505166bca797ceaa203e245667d56b34</author>
      <time>18:11</time>
      <text>hi</text>
    </message>
    <message line="2">
  </conversation>
  <conversation id="3c517e43554b6431f932acc138eed57e">
     <author>505166bca797ceaa203e245667d56b34</author>
      <time>18:11</time>
      <text>Aujourd.</text>
    </message>
    <message line="3">
      <author>4b66cb4831680c47cc6b66060baff894</author>
      <time>18:11</time>
      <text>hey</text>
    </message>
  </conversation>

   </conversations> 
1

There are 1 best solutions below

17
On
import xml.etree.ElementTree as ET
tree = ET.parse('conversations.xml')
for node in tree.iter():
    if node.tag == "conversations":
        continue
    if node.tag == "conversation":
        print("\n")  # visual break, new conversation
        print("{} {}".format(node.tag, node.attrib))
        continue
    if node.tag == "message":
        print("{} {}".format(node.tag, node.attrib))
        continue
    print("{} {}".format(node.tag, node.text))

So using the above you should be able to check for id, using similar logic If you are searching for 97964e7a9e8eb9cf78f2e4d7b2ff34c7, etc, make a list or dict.

authors = ['97964e7a9e8eb9cf78f2e4d7b2ff34c7']
for node in tree.iter():
    if node.tag == "author" and node.text in authors:
        print('found')