How to extract values from an XML tree with Python?

152 Views Asked by At

I have an API query that returns the below XML tree, and I'd like to pull out certain values from it. In particular, I'd like to pull info such as the LinkedInCount.

<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
<aws:OperationRequest>
<aws:RequestId>5486794a-0d03-4d47-a45b-e95764c3f0ee</aws:RequestId><
/aws:OperationRequest>
<aws:UrlInfoResult>
<aws:Alexa>
  
  <aws:ContentData>
    <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl>
    <aws:Asin>B00006D2TC</aws:Asin>
    <aws:SiteData>
      <aws:Title>Yahoo!</aws:Title>
      <aws:Description>Personalized content and search options. Chatrooms, free e-mail, clubs, and pager.</aws:Description>
      <aws:OnlineSince>18-Jan-1995</aws:OnlineSince>
    </aws:SiteData>
    <aws:Speed>
      <aws:MedianLoadTime>2242</aws:MedianLoadTime>
      <aws:Percentile>51</aws:Percentile>
    </aws:Speed>
    <aws:AdultContent>no</aws:AdultContent>
    <aws:Language>
      <aws:Locale>en</aws:Locale>
    </aws:Language>
    <aws:LinksInCount>76894</aws:LinksInCount>
    <aws:OwnedDomains>
      <aws:OwnedDomain>
        <aws:Domain>yahooligans.com</aws:Domain>
        <aws:Title>yahooligans.com</aws:Title>
      </aws:OwnedDomain>
    </aws:OwnedDomains>
  </aws:ContentData>
  
  <aws:Related>
    <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl>
    <aws:Asin>B00006D2TC</aws:Asin>
    <aws:RelatedLinks>
      <aws:RelatedLink>
        <aws:DataUrl type="canonical">aol.com/</aws:DataUrl>
        <aws:NavigableUrl>http://aol.com/</aws:NavigableUrl>
        <aws:Asin>B00006ARD3</aws:Asin>
        <aws:Relevance>301</aws:Relevance>
      </aws:RelatedLink>
    </aws:RelatedLinks>
    <aws:Categories>
      <aws:CategoryData>
        <aws:Title>On the Web/Web Portals</aws:Title>
        <aws:AbsolutePath>Top/Computers/Internet/On_the_Web/Web_Portals</aws:AbsolutePath>
      </aws:CategoryData>
    </aws:Categories>
  </aws:Related>        
        
  <aws:TrafficData>
    <aws:DataUrl type="canonical">yahoo.com/</aws:DataUrl>
    <aws:Asin>B00006D2TC</aws:Asin>
    <aws:Rank>1</aws:Rank>
    <aws:UsageStatistics>
    
      <aws:UsageStatistic>
        <aws:TimeRange>
          <aws:Days>1</aws:Days>
        </aws:TimeRange>
        <aws:Rank>
          <aws:Value>1</aws:Value>
          <aws:Delta>+0</aws:Delta>
        </aws:Rank>
        <aws:Reach>
          <aws:Rank>
            <aws:Value>2</aws:Value>
            <aws:Delta>+0</aws:Delta>
          </aws:Rank>
          <aws:PerMillion>
            <aws:Value>252,500</aws:Value>
            <aws:Delta>-1%</aws:Delta>
          </aws:PerMillion>
        </aws:Reach>
        <aws:PageViews>
          <aws:PerMillion>
            <aws:Value>51,400</aws:Value>
            <aws:Delta>-1%</aws:Delta>
          </aws:PerMillion>
          <aws:Rank>
            <aws:Value>1</aws:Value>
            <aws:Delta>+0</aws:Delta>
          </aws:Rank>
          <aws:PerUser>
            <aws:Value>13.7</aws:Value>
            <aws:Delta>-1%</aws:Delta>
          </aws:PerUser>
        </aws:PageViews>
      </aws:UsageStatistic>
      
    </aws:UsageStatistics>
  </aws:TrafficData>
  
</aws:Alexa>
</aws:UrlInfoResult>
<aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:StatusCode>Success</aws:StatusCode>
</aws:ResponseStatus>
</aws:Response>
</aws:UrlInfoResponse> 

Once I get the 'tree,' I can get the response with the following code:

elem = tree.find("//{http://alexa.amazonaws.com/doc/2005-10-05/}StatusCode")
print elem.text

However, I'm not sure how to get the LinksInCount which is contained

 <aws:LinksInCount>76894</aws:LinksInCount>

I've tried the following:

elem = tree.find("//{http://alexa.amazonaws.com/doc/2005-10-05/}LinksInCount")
print elem.text


elem = tree.find("LinksInCount")
print elem.text

http://docs.aws.amazon.com/AlexaWebInfoService/latest/

1

There are 1 best solutions below

0
Frerich Raabe On

It looks like you're using ElementTree; the find method only searches immediate child elements of the current element. Try using iterfind instead.