Reading XML file in Pandas

65 Views Asked by At

I can not read XML file in Pandas. I didn't find any solutions here and in web. In pandas documentation there is one similar example, but it didn't help

I tried to use xpath with //Cell. Maybe someone can help me. Thanks

 df=pd.read_xml(path, xpath=".//Cell")

XLM file looks like:


`<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Title>TestDay_EA NGF18_NGJ17,Daily 2008.07.01-2022.12.25</Title>
<Author>MetaQuotes Ltd.</Author>
<Revision>1</Revision>
<Created>2022-12-26T22:24:15Z</Created>
<Company>Copyright 2000-2022, MetaQuotes Ltd.</Company>
<Version>500</Version>
<Build>64</Build>
<Server>MetaQuotes-Demo</Server>
<Deposit>100000 USD</Deposit>
<Leverage>100</Leverage>
<Condition>0</Condition>
</DocumentProperties>
<Styles>
<Style ss:ID="ce0"><NumberFormat ss:Format="0"/></Style>
<Style ss:ID="ce1"><NumberFormat ss:Format="0.0"/></Style>
<Style ss:ID="ce2"><NumberFormat ss:Format="0.00"/></Style>
<Style ss:ID="ce3"><NumberFormat ss:Format="0.000"/></Style>
<Style ss:ID="ce4"><NumberFormat ss:Format="0.0000"/></Style>
<Style ss:ID="ce5"><NumberFormat ss:Format="0.00000"/></Style>
<Style ss:ID="ce6"><NumberFormat ss:Format="0.000000"/></Style>
<Style ss:ID="ce7"><NumberFormat ss:Format="0.0000000"/></Style>
<Style ss:ID="ce8"><NumberFormat ss:Format="0.00000000"/></Style>
<Style ss:ID="ce11"><NumberFormat ss:Format="Fixed"/></Style>
<Style ss:ID="ce12"><NumberFormat ss:Format="General Date"/></Style>
<Style ss:ID="ce13"><NumberFormat ss:Format="0.00000"/></Style>
</Styles>
<Worksheet ss:Name="Tester Optimizator Results">
<Table>
<Row>
<Cell><Data ss:Type="String">Pass</Data></Cell>
<Cell><Data ss:Type="String">Result</Data></Cell>
<Cell><Data ss:Type="String">Profit</Data></Cell>
<Cell><Data ss:Type="String">Expected Payoff</Data></Cell>
<Cell><Data ss:Type="String">Profit Factor</Data></Cell>
<Cell><Data ss:Type="String">Recovery Factor</Data></Cell>
<Cell><Data ss:Type="String">Sharpe Ratio</Data></Cell>
<Cell><Data ss:Type="String">Custom</Data></Cell>
<Cell><Data ss:Type="String">Equity DD %</Data></Cell>
<Cell><Data ss:Type="String">Trades</Data></Cell>
<Cell><Data ss:Type="String">MaPercent</Data></Cell>
<Cell><Data ss:Type="String">ma_period</Data></Cell>
<Cell><Data ss:Type="String">MA_Tral</Data></Cell>
<Cell><Data ss:Type="String">SL</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">31381</Data></Cell>
<Cell ss:StyleID="ce2"><Data ss:Type="Number">145776.00</Data></Cell>
<Cell ss:StyleID="ce2"><Data ss:Type="Number">45776.00</Data></Cell>
<Cell ss:StyleID="ce13"><Data ss:Type="Number">3814.666667</Data></Cell>
<Cell ss:StyleID="ce13"><Data ss:Type="Number">9.710942</Data></Cell>
<Cell ss:StyleID="ce13"><Data ss:Type="Number">3.542760</Data></Cell>
<Cell ss:StyleID="ce13"><Data ss:Type="Number">0.443114</Data></Cell>
<Cell><Data ss:Type="Number">0</Data></Cell>
<Cell ss:StyleID="ce11"><Data ss:Type="Number">8.3061</Data></Cell>
<Cell><Data ss:Type="Number">12</Data></Cell>
<Cell><Data ss:Type="Number">2</Data></Cell>
<Cell><Data ss:Type="Number">55</Data></Cell>
<Cell><Data ss:Type="String">false</Data></Cell>
<Cell><Data ss:Type="Number">25</Data></Cell>
</Row>

I expect to recieve info like enter image description here

1

There are 1 best solutions below

0
On

A different approach: convert the necessary elements to csv and create dataframe of that csv file. The upside is that this will also work if the xml contains more rows than the example file shown here.

import pandas as pd
from lxml import etree
import csv

with open('csvfile.csv', 'w') as f:
    writer = csv.writer(f)

row = []
for action, el in  etree.iterparse('test.xml', tag = 'Data', recover = True):
    row.append(el.text)
    
    if len(row) == 14:
        writer.writerow([';'.join(row)])
        row.clear()

df = pd.read_csv('csvfile.csv', sep = ';')    
print (df)

yields:

    Pass    Result   Profit  Expected Payoff  Profit Factor  Recovery Factor  Sharpe Ratio  Custom  Equity DD %  Trades  MaPercent  ma_period  MA_Tral  SL
0  31381  145776.0  45776.0      3814.666667       9.710942          3.54276      0.443114       0       8.3061      12          2         55    False  25