I am trying to extract images from a PDF file using PDFsharp. The test file I ran the code on shows the filter type being /JBIG2. I would like help in understanding how to decode this image and save it, if it is at all possible using PDFSharp.
The code I'm using to extract the image and then save it is as follows:
const string filename = "../../../test.pdf";
PdfDocument document = PdfReader.Open(filename);
int imageCount = 0;
foreach (PdfPage page in document.Pages) { // Iterate pages
// Get resources dictionary
PdfDictionary resources = page.Elements.GetDictionary("/Resources");
if (resources != null) {
// Get external objects dictionary
PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
if (xObjects != null) {
ICollection<PdfItem> items = xObjects.Elements.Values;
foreach (PdfItem item in items) { // Iterate references to external objects
PdfReference reference = item as PdfReference;
if (reference != null) {
PdfDictionary xObject = reference.Value as PdfDictionary;
// Is external object an image?
if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image") {
ExportImage(xObject, ref imageCount);
}
}
}
}
}
}
static void ExportImage(PdfDictionary image, ref int count) {
string filter = image.Elements.GetName("/Filter");
switch (filter) {
case "/DCTDecode":
ExportJpegImage(image, ref count);
break;
case "/FlateDecode":
ExportAsPngImage(image, ref count);
break;
}
}
static void ExportJpegImage(PdfDictionary image, ref int count) {
// Fortunately, JPEG has native support in PDF and exporting an image is just writing the stream to a file.
byte[] stream = image.Stream.Value;
FileStream fs = new FileStream(
String.Format("Image{0}.jpeg", count++), FileMode.Create, FileAccess.Write
);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(stream);
bw.Close();
}
In the above, I am getting the filter type as /JBIG2
, for which I do have support. The above code is used from PDFSharp: Export Images Sample
JBIG2 is most widely used in PDF, however outside of PDF is a different story. Although .jbig2 is a raster image format, support for it is quite sparse in terms of image viewers. Your best bet would be to export it as a CCITT4 compressed TIFF as Acrobat does.