VTDGen throws ParseException: File too big

550 Views Asked by At

I am trying to parse xml file with VTDGen library. It was perfect to parse xml till I am having trouble with over 1GB xml File.

This is a code how I parse it.

            VTDGen vg = new VTDGen();
            in = new SmbFileInputStream(fileToGet);
            byte[] b = new byte[(int) fileToGet.length()];
            in.read(b);             
            vg.setDoc(b);
            vg.parse(true);

This is an error I get it.

com.ximpleware.ParseException: Other error: file size too big >=1GB 

Is there any way I can increase size or should I write a code with a another parser?

Thank you in advance.

2

There are 2 best solutions below

2
On

read about the limitations of VTD:

  • Upper limits of various fields: (1) For starting tags (the max Qname length is 2048; the prefix 512), overflow conditions result in parse exceptions. For other tokens (upper limit is 1M), one can potentially break a long token into multiple shorter ones.(2) Depth field overflow condition results in parse exceptions. (3) Starting offset: Currently the biggest document supported is 1G characters (1G bytes or 2G bytes, depending on actual document encoding).

From http://vtd-xml.sourceforge.net/userGuide/0.html

0
On

There are two ways to get around the issue:

  1. Use extended VTD-XML. It is part of the vtd-xml distribution, shares a very similar API, but is a standalone product by itself.
  2. Turn off namespace awareness, that will boost the max document size from 1 GB to 2GB