In Java how do I evaluate XPATH expression on XML using SAX Parser?

3k Views Asked by At

In Java how do I evaluate XPATH expression on XML using SAX Parser?

Need more dynamic way because the XML format is not fixed. So i should be able pass the following

  1. xpath as string
  2. xml as string / input source

Something like Utility.evaluate("/test/@id='123'", "")

4

There are 4 best solutions below

0
On

Oracle's XQuery processor for Java will "dynamically" stream path expressions: https://docs.oracle.com/database/121/ADXDK/adx_j_xqj.htm#ADXDK99930

Specifically, there is information on streaming here, including an example: https://docs.oracle.com/database/121/ADXDK/adx_j_xqj.htm#ADXDK119

But it will not stream using SAX. You must bind the input XML as either StAX, InputStream, or Reader to get streaming evaluation.

1
On

Here is an exemple :

//First create a Document
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder(); 
Document doc = db.parse(new File("test.xml"));

//Init the xpath factory
XPath xPath =  XPathFactory.newInstance().newXPath();
String expression = "/company/employee";

//read a nodelist using xpath
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET);

EDIT :

If you want to use a SAX parser, then you can't use the XPath object of Java, see https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/package-summary.html

The XPath language provides a simple, concise syntax for selecting nodes from an XML document. XPath also provides rules for converting a node in an XML document object model (DOM) tree to a boolean, double, or string value. XPath is a W3C-defined language and an official W3C recommendation; the W3C hosts the XML Path Language (XPath) Version 1.0 specification.

XPath started in life in 1999 as a supplement to the XSLT and XPointer languages, but has more recently become popular as a stand-alone language, as a single XPath expression can be used to replace many lines of DOM API code.

If you want to use SAX you can look at libs detailed in this question : Is there any XPath processor for SAX model? .

Although the mechanic of XPath does not really suit SAX. Indeed using a SAX parser won't create an XML tree in memory. Hence you can't use XPath efficiently because it won't see not loaded nodes.

0
On

Only a small subset of XPath is amenable to streamed evaluation, that is, evaluation on-the-fly while parsing the input document. There are therefore not many streaming XPath processor around; most of them are the product of academic research projects.

One thing you could try is Saxon-EE streamed XQuery. This is a small subset of XQuery that allows streamed executaion (it will allow expressions like your example). Details at

http://www.saxonica.com/documentation/#!sourcedocs/streaming/streamed-query

0
On

You can use a SAXSource with XPath using Saxon, but - and this is important - be aware that the underlying implementation will almost certainly still be loading and buffering some or all of the document in memory in order to evaluate the xpath. It probably won't be a full DOM tree (Saxon relies on its own structure called TinyTree, which supports lazy-loading and various other optimizations), so it's better than using most DOM implementations, but it still involves loading the document into memory. If your concern is memory load for large data sets, it probably won't help you much, and you'd be better off using one of the streaming xpath/xquery options suggested by others.

An implementation of your utility method might look something like this:

import java.io.StringReader;

import javax.xml.namespace.QName;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;

import org.xml.sax.InputSource;

import net.sf.saxon.xpath.XPathFactoryImpl;

public class XPathUtils {

    public static Object evaluate(String xpath, String xml, QName returnType)
            throws Exception {
        SAXParser parser = (SAXParser) SAXParserFactory.newInstance()
                .newSAXParser();
        InputSource source = new InputSource(new StringReader(xml));
        SAXSource saxSource = new SAXSource(parser.getXMLReader(), source);
        XPath xPath = new XPathFactoryImpl().newXPath();
        return xPath.evaluate(xpath, saxSource, returnType);
    }

    public static String xpathString(String xpath, String xml)
            throws Exception {
        return (String) evaluate(xpath, xml, XPathConstants.STRING);
    }

    public static boolean xpathBool(String xpath, String xml) throws Exception {
        return (Boolean) evaluate(xpath, xml, XPathConstants.BOOLEAN);
    }

    public static Number xpathNumber(String xpath, String xml) throws Exception {
        return (Number) evaluate(xpath, xml, XPathConstants.NUMBER);
    }

    public static void main(String[] args) throws Exception {
        System.out.println(xpathString("/root/@id", "<root id='12345'/>"));
    }
}

This works because the Saxon XPath implementation supports SAXSource as a context for evaluate(). Be aware that trying this with the built-in Apaache XPath implementation will throw an exception.