Unable to parse element attribute with XOM

Question

Unable to parse element attribute with XOM

175 Views Asked by Stevoisiak At 18 November 2016 at 22:49

I'm attempting to parse an RSS field using the XOM Java library. Each entry's image URL is stored as an attribute for the <img> element, as seen below.

<rss version="2.0">
  <channel>
    <item>
      <title>Decision Paralysis</title>
      <link>https://xkcd.com/1801/</link>
      <description>
        <img src="https://imgs.xkcd.com/comics/decision_paralysis.png"/>
      </description>
      <pubDate>Mon, 20 Feb 2017 05:00:00 -0000</pubDate>
      <guid>https://xkcd.com/1801/</guid>
    </item>
  </channel>
</rss>

Attempting to parse <img src=""> with .getFirstChildElement("img") only returns a null pointer, making my code crash when I try to retrieve <img src= ...>. Why is my program failing to read in the <img> element, and how can I read it in properly?

import nu.xom.*;

public class RSSParser {
    public static void main() {
        try {
            Builder parser = new Builder();
            Document doc = parser.build ( "https://xkcd.com/rss.xml" );
            Element rootElement = doc.getRootElement();
            Element channelElement = rootElement.getFirstChildElement("channel");
            Elements itemList = channelElement.getChildElements("item");

            // Iterate through itemList
            for (int i = 0; i < itemList.size(); i++) {
                Element item = itemList.get(i);
                Element descElement = item.getFirstChildElement("description");
                Element imgElement = descElement.getFirstChildElement("img");
                // Crashes with NullPointerException
                String imgSrc = imgElement.getAttributeValue("src");
            }
        }
        catch (Exception error) {
            error.printStackTrace();
            System.exit(1);
        }
    }
}

Original Q&A

There are 2 best solutions below

**Elliotte Rusty Harold** · Answer 1 · 2016-11-30T23:11:42.373000

There is no img element in the item. Try

  if (imgElement != null) {
    String imgSrc = imgElement.getAttributeValue("src");
  }

What the item contains is this:

<description>&lt;img    
    src="http://imgs.xkcd.com/comics/us_state_names.png" 
    title="Technically DC isn't a state, but no one is too 
    pedantic about it because they don't want to disturb the snakes
    ." 
     alt="Technically DC isn't a state, but no one is too pedantic about it because they don't want to disturb the snakes." /&gt;  
</description>

That's not an img elment. It's plain text.

**Stevoisiak** · Answer 2 · 2017-02-17T16:20:20.707000

I managed to come up with a somewhat hacky solution using regex and pattern matching.

// Iterate through itemList
for (int i = 0; i < itemList.size(); i++) {
    Element item = itemList.get(i);
    String descString = item.getFirstChildElement("description").getValue();

    // Parse image URL (hacky)
    String imgSrc = "";
    Pattern pattern = Pattern.compile("src=\"[^\"]*\"");
    Matcher matcher = pattern.matcher(descString);
    if (matcher.find()) {
        imgSrc = descString.substring( matcher.start()+5, matcher.end()-1 );
    }
}

Unable to parse element attribute with XOM

There are 2 best solutions below

Related Questions in JAVA

Related Questions in RSS

Related Questions in XOM

Trending Questions

Popular # Hahtags

Popular Questions