I am trying to extract some information from an ONIX XML format file using Python lxml
parser.
Among other things, the part I am interested in in the document looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<ProductSupply>
<SupplyDetail>
<Supplier>
<SupplierRole>03</SupplierRole>
<SupplierName>EGEN</SupplierName>
</Supplier>
<ProductAvailability>40</ProductAvailability>
<Price>
<PriceType>01</PriceType>
<PriceAmount>0.00</PriceAmount>
<Tax>
<TaxType>01</TaxType>
<TaxRateCode>Z</TaxRateCode>
<TaxRatePercent>0</TaxRatePercent>
<TaxableAmount>0.00</TaxableAmount>
<TaxAmount>0.00</TaxAmount>
</Tax>
<CurrencyCode>NOK</CurrencyCode>
</Price>
<Price>
<PriceType>02</PriceType>
<PriceQualifier>05</PriceQualifier>
<PriceAmount>0.00</PriceAmount>
<Tax>
<TaxType>01</TaxType>
<TaxRateCode>Z</TaxRateCode>
<TaxRatePercent>0</TaxRatePercent>
<TaxableAmount>0.00</TaxableAmount>
<TaxAmount>0.00</TaxAmount>
</Tax>
<CurrencyCode>NOK</CurrencyCode>
</Price>
</SupplyDetail>
</ProductSupply>
I need to pick up the price amount with the following conditions:
PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05'
I tried:
price = p.find(
"ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount").text
For some reason my XPath with and
operators does not work and get the following error:
File "<string>", line unknown
SyntaxError: invalid predicate
Any idea how to approach it? Any assistance is highly appreciated!
TL;DR: Use
xpath()
because boolean operators likeand
are not supported byfind*()
methods.As Daniel suggested, you should use lxml's parser method
xpath()
for your (rather complex) XPath expression.XPath
Your XPath expression contains node tests and predicates which use the boolean operator
and
(XPath 1.0):Tip: Test it online (see Xpather demo). This asserts that it finds a single element
<PriceAmount>0.00</PriceAmount>
as expected.Using
find()
methodsAccording to Python docs you can use following find methods which accept a match expression (e.g. XPath) as argument:
find
findAll
Issue: limited XPath syntax support for
find()
Although their supported XPath syntax is limited!
This limitation includes logical operators like your
and
. Karl Thornton explains this on his page XML parsing: Python ~ XPath ~ logical AND | Shiori.On the other side a note on lxml documentation prefers them:
(emphasis mine)
Using lxml's
xpath()
So lets start with the safer and richer
xpath()
function (before premature optimization). For example:See also: