Reading UTF-16 XML files with JCabi Java

Question

Reading UTF-16 XML files with JCabi Java

150 Views Asked by AudioBubble At 17 August 2025 at 15:02

I have found this JCabi snippet code that works well with UTF-8 xml encoded files, it basically reads the xml file and then prints it as a string.

            XML xml;
            try {
                xml = new XMLDocument(new File("test8.xml"));
                String xmlString = xml.toString();
                System.out.println(xmlString);
            } catch (FileNotFoundException e1) {
                e1.printStackTrace();
            }

However I need this to run this same code on a UTF-16 encoded xml it gives me the following error:

[Fatal Error] :1:1: Content is not allowed in prolog. Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Can't parse, most probably the XML is invalid

Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

I have read about this error and this means that the parser it is not recognizing the prolog because it's seeing characters that are not supposed to be there because of the encoding.

I have tried other libraries that offer a way to "tell" the class which encoding the source file is encoded in, but the only library I was able to get it to work to some degree was JCabi, but I was not able to find a way to tell it that my source file is encoded in UTF-16.

Thanks, any help is appreciated.

Original Q&A

There are 1 best solutions below

**andrewJames** · Accepted Answer

The jcabi XMLDocument has various constructors including one which takes a string. So one approach is to use:

Path path = Paths.get("test16_LE_with_bom.xml");
XML xml = new XMLDocument(Files.readString(path, StandardCharsets.UTF_16LE));
String xmlString = xml.toString();
System.out.println(xmlString);

This makes use of java.nio.charset.StandardCharsets and java.nio.file.Files.

In my first test, my XML file was encoded as UTF-16-LE (and with a BOM at the start: FF FE for little-endian). The above approach handled the BOM OK.

My test file's prolog is as follows (with no explicit encoding - maybe that's a bad thing, here?):

<?xml version="1.0"?>

In my second test I removed the BOM and re-ran with the updated file - which also worked.

I used Notepad++ and a hex editor to verify/select encodings & to edit the test files.

Your file may be different from my test files (BE vs. LE).

Reading UTF-16 XML files with JCabi Java

There are 1 best solutions below

Related Questions in JAVA

Related Questions in XML

Related Questions in JCABI

Trending Questions

Popular # Hahtags

Popular Questions