Flattening form using PDFClown throws IndexOutOfBounds exception

429 Views Asked by At

I'm using PDFClown-0.2.0 to flatten this pdf file. This is the code I have:

import org.pdfclown.documents.Document;
import org.pdfclown.files.File;
import org.pdfclown.files.SerializationModeEnum;
import org.pdfclown.tools.FormFlattener;

public class Sample {
    public static void main(String args[]){
        try {
            File f = new File("label.pdf");
            Document doc = f.getDocument();

            FormFlattener formFlattener = new FormFlattener();
            formFlattener.flatten(doc);
            f.save(SerializationModeEnum.Standard);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

I'm following the instruction provided at http://pdfclown.org/2014/09/12/waiting-for-pdf-clown-0-2-0-release/#FormFlattening. However, when I run the code, I get the following error:

java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at org.pdfclown.objects.PdfArray.get(PdfArray.java:314)
    at org.pdfclown.documents.interaction.forms.FieldWidgets.get(FieldWidgets.java:135)
    at org.pdfclown.documents.interaction.forms.FieldWidgets$1.next(FieldWidgets.java:380)
    at org.pdfclown.documents.interaction.forms.FieldWidgets$1.next(FieldWidgets.java:1)
    at org.pdfclown.tools.FormFlattener.flatten(FormFlattener.java:74)
    at com.narvar.webservices.returns.retailers.Sample.main(Sample.java:18)

What am I doing wrong? Just a note that the pdf was generated using PDFBox, and I had made the form fields readonly.

1

There are 1 best solutions below

0
On BEST ANSWER

Having debugged into the code it looks like a PdfClown bug:

The Iterator returned by org.pdfclown.documents.interaction.forms.FieldWidgets.iterator() does not recognize that the widget collection underneath has changed (gotten smaller) and so tries to read beyond its size.

In detail:

org.pdfclown.tools.FormFlattener.flatten(Document) iterates over the widgets of a field:

  for(Widget widget : field.getWidgets())

but inside this loop it removes the current widget from the Kids of the current field:

    // Removing the field references relating the widget...
    PdfDictionary fieldPartDictionary = widget.getBaseDataObject();
    while (fieldPartDictionary != null)
    {
      [...]
      kidsArray.remove(fieldPartDictionary.getReference());
      [...]
    }

Thus, the collection over which the outer for iterates changes. Unfortunately the Iterator used here is not aware of changes in the base collection

return new Iterator<Widget>()
{
  /** Index of the next item. */
  private int index = 0;
  /** Collection size. */
  private final int size = size();

  @Override
  public boolean hasNext( )
  {return (index < size);}

  @Override
  public Widget next( )
  {
    if(!hasNext()) throw new NoSuchElementException();
    return get(index++);
  }

  @Override
  public void remove( )
  {throw new UnsupportedOperationException();}
};

As you see it not merely neither is informed nor is checking itself the base collection, it even has its own idea about the collection size which is the size of the collection at Iterator generation set in size.

Such an Iterator implementation is ok for non-changing collections which can be enforced by architecture or by contract. But in the case at hand here I see neither, the architecture obviously allows the collection to change, and there is no hint that the iterator in question may be used only for stable base collections.

This should be fixed.

A solution approach

A solution can be attempted by changing FormFlattener.flatten to retrieve a local copy of the widgets and iterate over this copy, e.g. by replacing

  for(Widget widget : field.getWidgets())

with

  List<Widget> widgets = new ArrayList<Widget>(field.getWidgets());
  for(Widget widget : widgets)