PDF document manipulation

505 Views Asked by AudioBubble At 08 April 2009 at 15:47

I have several PDFs with the following properties:

Each PDF contains a variable number of "documents" with differing number of pages.

Each page in a "document" has text such as "Page 3 of 26".

I want to be able to automatically identify the first and last page of each "document" within a PDF (Note: this is not the same as the first and last page of a PDF as each PDF may contain several "documents") and extract these into a new PDF for later printing and archival.

I'm not sure what tools I can bring to bear on this problem and what libraries are available to tackle this.

Any recommendations? Preferably free and can be used to create a tool that will run on Windows.

Original Q&A

There are 3 best solutions below

Adam Rosenfield On 08 April 2009 at 15:53

You can try using pdftk to decompress the PDF, parse the data, split it, and then recompress it.

AudioBubble On 08 April 2009 at 16:40

I managed to come up with a horrible unix hack that will work:

use pdftk to decompress and explode into separate pages
use pdftotext to convert each page into text
write a script to identify the appropriate string in the txt and copy the corresponding pdf into a sub-directory [in progress]
find some tool to recombine [to be investigated, probably pdftk can do]

Should work on my unix platform but not sure if it is acceptable to bring all these tools onto the windows environment.

One potential is to use an email gateway to receive pdfs and return processed pdf which makes it even more ugly.

Anyone with a native win32 solution?

Steve K On 08 April 2009 at 16:47

Java has a nice free pdf library. Check out iText.

From iText's site:

You can use iText to:

Serve PDF to a browser
Generate dynamic documents from XML files or databases
Use PDF's many interactive features
Add bookmarks, page numbers, watermarks, etc.
Split, concatenate, and manipulate PDF pages
Automate filling out of PDF forms
Add digital signatures to a PDF file
And much more...

Since it's Java, there should be no issues running on Windows, or anywhere else for that matter.

PDF document manipulation

There are 3 best solutions below

Related Questions in PDF

Related Questions in PDF-MANIPULATION

Trending Questions

Popular # Hahtags

Popular Questions