I am trying to extract some information from an ONIX XML format file using Python lxml parser.
Among other things, the part I am interested in in the document looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<ProductSupply>
<SupplyDetail>
<Supplier>
<SupplierRole>03</SupplierRole>
<SupplierName>EGEN</SupplierName>
</Supplier>
<ProductAvailability>40</ProductAvailability>
<Price>
<PriceType>01</PriceType>
<PriceAmount>0.00</PriceAmount>
<Tax>
<TaxType>01</TaxType>
<TaxRateCode>Z</TaxRateCode>
<TaxRatePercent>0</TaxRatePercent>
<TaxableAmount>0.00</TaxableAmount>
<TaxAmount>0.00</TaxAmount>
</Tax>
<CurrencyCode>NOK</CurrencyCode>
</Price>
<Price>
<PriceType>02</PriceType>
<PriceQualifier>05</PriceQualifier>
<PriceAmount>0.00</PriceAmount>
<Tax>
<TaxType>01</TaxType>
<TaxRateCode>Z</TaxRateCode>
<TaxRatePercent>0</TaxRatePercent>
<TaxableAmount>0.00</TaxableAmount>
<TaxAmount>0.00</TaxAmount>
</Tax>
<CurrencyCode>NOK</CurrencyCode>
</Price>
</SupplyDetail>
</ProductSupply>
I need to pick up the price amount with the following conditions:
PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05'
I tried:
price = p.find(
"ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount").text
For some reason my XPath with and operators does not work and get the following error:
File "<string>", line unknown
SyntaxError: invalid predicate
Any idea how to approach it? Any assistance is highly appreciated!
TL;DR: Use
xpath()because boolean operators likeandare not supported byfind*()methods.As Daniel suggested, you should use lxml's parser method
xpath()for your (rather complex) XPath expression.XPath
Your XPath expression contains node tests and predicates which use the boolean operator
and(XPath 1.0):Tip: Test it online (see Xpather demo). This asserts that it finds a single element
<PriceAmount>0.00</PriceAmount>as expected.Using
find()methodsAccording to Python docs you can use following find methods which accept a match expression (e.g. XPath) as argument:
findfindAllIssue: limited XPath syntax support for
find()Although their supported XPath syntax is limited!
This limitation includes logical operators like your
and. Karl Thornton explains this on his page XML parsing: Python ~ XPath ~ logical AND | Shiori.On the other side a note on lxml documentation prefers them:
(emphasis mine)
Using lxml's
xpath()So lets start with the safer and richer
xpath()function (before premature optimization). For example:See also: