I've looked and tried multiple things however have not been able to read the nested tags in my XML file. I have extracted the outter tags value and not the nested street and city tags under the address tag. I'm on a time crunch and I'm unable to read the nested tags after trying a bunch of things. PLEASE HELP!!!
The expected outcome I'm trying to get it is ---->
Common Botanical zone light price Street City
bloodroot Sanguinaria canadensis 4 mostly shady 2.44 1 toronto
And so on ---->
However, I have not been able to retrieve the street and city columns because my code is not picking up the nested tags.
I have been able to achieve the following output by removing the code involving city and street tags.
Common Botanical zone light price
bloodroot Sanguinaria canadensis 4 mostly shady 2.44
The following is my xml file with 2 entries only for testing purposes. I'm trying to create columns of each text information under the plant tag also mentioned above. I'm reading using the databricks file system. I open and create a csv and write to it and then close it. The indentation is correct, it may have been mixed up when I was copy pasting.
<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<ADDRESS>
<STREET>1</STREET>
<CITY>toronto</CITY>
</ADDRESS>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>
<PLANT>
<COMMON>Columbine</COMMON>
<BOTANICAL>Aquilegia canadensis</BOTANICAL>
<ZONE>3</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$9.37</PRICE>
<ADDRESS>
<STREET>2</STREET>
<CITY>montreal</CITY>
</ADDRESS>
<AVAILABILITY>030699</AVAILABILITY>
</PLANT>
</CATALOG>
-----------This is the code I have used ---------------
from xml.etree import ElementTree
import csv
import os
xml = ElementTree.parse("/dbfs/mnt/ods-outbound/xml_test/plant_catalog.xml")
#creating a file
csvfile= open("/dbfs/mnt/ods-outbound/xml_test/plant_catalog.csv",'w',encoding='utf-8')
csvfile_writer=csv.writer(csvfile)
# ADD THE HEADER TO CSV FILE
csvfile_writer.writerow(["common","botanical","zone","light","price","availability","street","city"])
# FOR EACH PLANT
for plant in xml.findall("PLANT"):
if(plant)
# EXTRACT PLANT DETAILS
common = plant.find("COMMON")
botanical = plant.find("BOTANICAL")
zone = plant.find("ZONE")
light = plant.find("LIGHT")
price = plant.find("PRICE")
availability = plant.find("AVAILABILITY")
street = plant.find("STREET")
city = plant.find("CITY")
csv_line = [common.text, botanical.text, zone.text, light.text, price.text, availability.text,street.text,city.text]
# ADD A NEW ROW TO CSV FILE
csvfile_writer.writerow(csv_line)
csvfile.close()
As per the xml file, the street and city values are inside address tag.
changes:a.xml(input file):main.py:plant_catalog.csv(output file):