Extracting vector graphics (lines and points) with pdfclown

463 Views Asked by At

I want to extract vector graphics (lines and points) out of a pdf with pdfclown. I have tried to wrap my head around the graphics sample but i cannot figure out how the object model works for this. Please can anyone explain the relationships?

1

There are 1 best solutions below

2
On BEST ANSWER

You are right: till PDF Clown 0.1 series, high-level path modelling was not implemented (it would have been derived from ContentScanner.GraphicsWrapper).

Next release (0.2 series, due next month) will support the high-level representation of all the graphics contents, including path objects (PathElement), through the new ContentModeller. Here is an example:

import org.pdfclown.documents.contents.elements.ContentModeller;
import org.pdfclown.documents.contents.elements.GraphicsElement;
import org.pdfclown.documents.contents.elements.PathElement;
import org.pdfclown.documents.contents.objects.Path;

import java.awt.geom.GeneralPath;

for(GraphicsElement<?> element : ContentModeller.model(page, Path.class))
{
  PathElement pathElement = (PathElement)element;
  List<ContentMarker> markers = pathElement.getMarkers();
  pathElement.getBox();
  GeneralPath getPath = pathElement.getPath();
  pathElement.isFilled();
  pathElement.isStroked();
}

In the meantime, you can extract the low-level representation of the vector graphics iterating the content stream through ContentScanner as suggested in ContentScanningSample (available in the downloadable distribution), looking for path-related operations (BeginSubpath, DrawLine, DrawRectangle, DrawCurve, ...).