I understand PDFKit
allows extracting text+formatting as NSAttributedString
, but I can't find any info on extracting each individual figures from any PDF document using Swift.
Any help would be greatly appreciated, thanks!
edit: https://stackoverflow.com/a/40788449/2303865 explains how to convert the whole page into image, however I need to parse all images already part of the a series of PDF documents, without knowing where they are located, so that solution is not appropriate to my question.
Here is a Swift function that extracts images, more specifically all Objects with Subtype "Image" from pdf pages:
You should know that images in PDFs can be represented in different ways. They can be embedded as self contained JPGs or they can be embedded as raw pixel data (lossless compressed or not) with meta information about the compression, color space, width, height, and so forth.
So if you want to export embedded JPGs: this code works just fine. But if you also want to visualise the raw images you will need even more parsing code. To get started you can look at the PDF 2.0 spec (or an older free version of the spec), and this gist which interprets JPGs in any color profile and raw images with any of the following color profiles: