I have several PDFs with the following properties:
Each PDF contains a variable number of "documents" with differing number of pages.
Each page in a "document" has text such as "Page 3 of 26".
I want to be able to automatically identify the first and last page of each "document" within a PDF (Note: this is not the same as the first and last page of a PDF as each PDF may contain several "documents") and extract these into a new PDF for later printing and archival.
I'm not sure what tools I can bring to bear on this problem and what libraries are available to tackle this.
Any recommendations? Preferably free and can be used to create a tool that will run on Windows.
I managed to come up with a horrible unix hack that will work:
Should work on my unix platform but not sure if it is acceptable to bring all these tools onto the windows environment.
One potential is to use an email gateway to receive pdfs and return processed pdf which makes it even more ugly.
Anyone with a native win32 solution?