I found a short c++ example on stack overflow that extracts text from a pdf using c++ poppler, but cannot find any call to extract the images.
I have a pdf which is a sequence of scans from a copier. I would like to open them one after another
#include <iostream>
#include "poppler-document.h"
#include "poppler-page.h"
#include "poppler-image.h"
using namespace std;
int main()
{
poppler::document *doc = poppler::document::load_from_file("./test1dld_scan.pdf");
const uint32_t num_pages = doc->pages();
cout << "page count: " << num_pages << endl;
for (int i = 0; i < num_pages; ++i) {
const auto page = doc->create_page(i);
if (!page) {
std::cerr << "Unable to create the page." << std::endl;
return 1;
}
auto images = page-> // right here....
}
There is a poppler::image class, but I can't find anywhere in the documentation where I can get it out. There is no mention of image in the document, and none on the page either.
Ok, I figured it out. I am not aware of any API to extract the actual image from a PDF file. But the renderer can render your pdf to any desired DPI. Of course, I would like the DPI to match the resolution of the image inside.
I have a pdf of scanned images. I render it at 300 dpi and it works: