VTD fails to evaluate a "find all empty nodes with no attributes" xpath

267 Views Asked by At

I found a bug (I think) using the 2.13.4 version of vtd-xml. Well, in short I have the following snippet code:

String test = "<catalog><description></description></catalog>";
VTDGen vg = new VTDGen();
vg.setDoc(test.getBytes("UTF-8"));
vg.parse(true);
VTDNav vn = vg.getNav();
//get nodes with no childs, text and attributes
String xpath = "/catalog//*[not(child::node()) and not(child::text()) and count(@*)=0]";
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath(xpath);
//block inside while is never executed
 while(ap.evalXPath()!=-1) {
   System.out.println("current node "+vn.toRawString(vn.getCurrentIndex()));
}

and this doesn't work (=do not find any node, while it should find "description" instead). The code above works if I use the self closed tag:

String test = "<catalog><description/></catalog>";

The point is every xpath evaluator works with both version of the xml. Sadly I receive the xml from an external source, so I have no power over it... Breaking the xpath I noticed that evaluating both

/catalog//*[not(child::node())]

and

/catalog//*[not(child::text())]

give false as result. As additional bit I tried something like:

String xpath = "/catalog/description/text()";
ap.selectXpath(xpath);
if(ap.evalXPath()!=-1)
   System.out.println(vn.toRawString(vn.getCurrentIndex()));

And this print empty space, so in some way VTD "thinks" the node has text, even empty but still, while I expect a no match. Any hint?

1

There are 1 best solutions below

0
Stephan On BEST ANSWER

TL;DR

When I faced this issue, I was left mainly with three options (see below). I went for the second option : Use XMLModifier to fix the VTDNav. At the bottom of my answser, you'll find an implementation of this option and a sample output.


The long story ...

I faced the same issue. Here are the main three options I first thought of (by order of difficulty) :

1. Turn empty elements into self closed tags in the XML source.

This option isn't always possible (like in OP case). Moreover, it may be difficult to "pre-process" the xml before hand.

2. Use XMLModifier to fix the VTDNav.

Find the empty elements with an xpath expression, replace them with self closed tags and rebuild the VTDNav.

2.bis Use XMLModifier#removeToken

A lower level variant of the preceding solution would consist in looping over the tokens in VTDNav and remove unecessary tokens thanks to XMLModifier#removeToken.

3. Patch the vtd-xml code directly.

Taking this path may require more effort and more time. IMO, the optimized vtd-xml code isn't easy to grasp at first sight.


Option 1 wasn't feasible in my case. I failed implementing Option 2bis. The "unecessary" tokens still remained. I didn't look at Option 3 because I didn't want to fix some (rather complex) third party code.

I was left with Option 2. Here is an implementation:

Code

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import com.ximpleware.AutoPilot;
import com.ximpleware.NavException;
import com.ximpleware.VTDException;
import com.ximpleware.VTDGen;
import com.ximpleware.VTDNav;
import com.ximpleware.XMLModifier;

@Test
public void turnEmptyElementsIntoSelfClosedTags() throws VTDException, IOException {
    // STEP 1 : Load XML into VTDNav
    // * Convert the initial xml code into a byte array
    String xml = "<root><empty-element></empty-element><self-closed/><empty-element2 foo='bar'></empty-element2></root>";
    byte[] ba = xml.getBytes(StandardCharsets.UTF_8);

    // * Build VTDNav and dump it to screen
    VTDGen vg = new VTDGen();
    vg.setDoc(ba);
    vg.parse(false); // Use `true' to activate namespace support

    VTDNav nav = vg.getNav();
    dump("BEFORE", nav);


    // STEP 2 : Prepare to fix the VTDNAv
    // * Prepare an autopilot to find empty elements
    AutoPilot ap = new AutoPilot(nav);
    ap.selectXPath("//*[count(child::node())=1][text()='']");

    // * Prepare a simple regex matcher to create self closed tags
    Matcher elementReducer = Pattern.compile("^<(.+)></.+>$").matcher("");


    // STEP 3 : Fix the VTDNAv
    // * Instanciate an XMLModifier on the VTDNav
    XMLModifier xm = new XMLModifier(nav);
    ByteArrayOutputStream baos = new ByteArrayOutputStream(); // baos will hold the elements to fix
    String utf8 = StandardCharsets.UTF_8.name();

    // * Find all empty elements and replace them
    while (ap.evalXPath() != -1) {
        nav.dumpFragment(baos);
        String emptyElementXml = baos.toString(utf8);
        String selfClosingTagXml = elementReducer.reset(emptyElementXml).replaceFirst("<$1/>");

        xm.remove();
        xm.insertAfterElement(selfClosingTagXml);

        baos.reset();
    }

    // * Rebuild VTDNav and dump it to screen
    nav = xm.outputAndReparse(); // You MUST call this method to save all your changes
    dump("AFTER", nav);
}

private void dump(String msg,VTDNav nav) throws NavException, IOException {
    System.out.print(msg + ":\n  ");
    nav.dumpFragment(System.out);
    System.out.print("\n\n");
}

Output

BEFORE:
  <root><empty-element></empty-element><self-closed/><empty-element2 foo='bar'></empty-element2></root>

AFTER:
  <root><empty-element/><self-closed/><empty-element2 foo='bar'/></root>