Getting values from an XML file that has deep keys and values

262 Views Asked by At

I have a very large xml file produced from an application whose part of tree is as below:

XML Tree

There are several items under 'item' from 0 to 7. These names are always named as numbers it can range from 0 to any number. Each of these items will have multiple items all with same structure as per the above tree. Only item 0 to 7 is variable all other structure remains same. under I have a value <bbmds_questiontype>: which can be Multiple Choice or Matching or Essays.

What I need is to have a list the values of <mat_formattedtext>. ie. the output is supposed to be:

                <0>
            <bbmds_questiontype>Multiple Choice</bbmds_questiontype>
        <mat_formattedtext>This is first question </mat_formattedtext></0>
                <1>
            <bbmds_questiontype>Multiple Choice</bbmds_questiontype>
    <mat_formattedtext>This is second question </mat_formattedtext> </1>
                  <2>
            <bbmds_questiontype>Essay</bbmds_questiontype>
    <mat_formattedtext>This is first question </mat_formattedtext> </2>
....

I have tried several solution included xml tree, xmltodict all getting complicated as filters to be applied across different branches of children

import xmltodict
with open("C:/Users/SS/Desktop/moodlexml/00001_questions.dat") as fd:
    doc = xmltodict.parse(fd.read())
shortened=doc['questestinterop']['assessment']['section']['item'] # == u'an attribute'

Any advice will be appreciated to proceed further.

1

There are 1 best solutions below

0
ShockerSam On

Have you tried to use bs4 parsing, its simple

Check it out https://linuxhint.com/parse_xml_python_beautifulsoup/