I have a folder where multiple clients upload multiple PDF files.
Some of them are using embedded fonts, some doesn't.
I've been working on a service that optimizes (in terms of file size) the PDF files in this folder.
Each user may be uploading around 400 files, weighing anywhere between 80K to 10M, and my task is to optimize all of them to the smallest possible file size with minimal quality lose.
the PDF Library is doing a great job with it. My only problem is that I can't remove all embedded fonts from all files, since some of the files might use these fonts and the result would be a file that I can't use.
So my questions are:
- How can I detect what files use and what files doesn't use embedded fonts?
- When optimizing the files that use embedded fonts, How can I remove only the unused fonts?
what I want to achieve is to remove all embedded fonts from most of the files, but keep the embedded fonts in the files where I actually need them. I understand that it depends on the fonts I have on my system (these files should stay on a single system so portability is not that important to me), so I try to find a way to identify, before optimizing, what files will look OK without embedded fonts, and what files I need to keep the embedded fonts.
APDFL has a PDFontIsEmbedded() call. The DotNet interface's Font class has an Embedded property. Saving with the GarbageCollect SaveFlag should remove any unreferenced indirect objects, including fonts.
Note that Resource Dictionaries could potentially be shared by multiple pages so that fonts not used by one page might be used by another page that uses the same resource dictionary.