So when I use the pdf2image
python import, and pass a multi page PDF into the convert_from_bytes()
- or convert_from_path()
method, the output array does contain multiple images - but all images are of the last PDF page (whereas I would've expected that each image represented one of the PDF pages).
The output looks something like this:
Any idea on why this would occur? I can't find any solution to this online. I've found some vague suggestion that the use_cropbox
argument might be used, but modifying it has no effect.
def convert(opened_file)
# Read PDF and convert pages to PPM image objects
try:
_ppm_pages = self.pdf2image.convert_from_bytes(
opened_file.read(),
grayscale = True
)
except Exception as e:
print(f"[CreateJPEG] Could not convert PDF pages to JPEG image due to error: \n '{e}'")
return
# Do stuff with _ppm_pages
for img in _ppm_pages:
img.show() # ...all images in that list are of the last page
Sometimes the output is an empty 1x1 image, instead, which I also haven't found a reason for. So if you have any idea what that is about, please do let me know!
Thanks in advance, Simon
EDIT: Added code.
EDIT: So, when I try this in a random notebook, it actually works fine.
I've removed a few detours I used in my original code, and now it works. Still not sure what the underlying reason was though...
All the same, thanks for your help, everyone!
I'm using this right now....
That gives me a list of images within imgSet