get image byte in page

443 Views Asked by At

PdfReader with Itext library, so:

PdfReader reader = new PdfReader();
reader.getPageContent(4)

my page(4) content a image, getPageContent return byte[]

this image is empty in result

1

There are 1 best solutions below

0
On

When you do reader.getPageContent(4), you get a byte[] containing PDF syntax. For instance:

BT
36 788 Td
/F1 12 Tf
(Hello World )Tj
ET
q
0 0 m
595 842 l
S
Q

In no way this is an image. In no way this is content that can be used as a standalone object. For instance: /F1 refers to a resource, more specifically to a font. Without looking at the /Resources of the pages of which we extracted the PDF syntax, we have no idea what the PDF string (Hello World) looks like.

The title of your question get image byte in page is wrong. You say my page(4) content a image but that isn't English. Let's assume you mean to say my page 4 contains an image. In that case, the byte[] returned by getPageContent() will look somewhat like this:

q 20 0 0 20 36 786 cm /img0 Do Q

In this syntax q and Q save and restore the state. The cm operator defines the size and the position of the image: it will be 20 by 20 user units and positioned at x = 36 and y = 786. The actual image is stored in the resources of the page dictionary as an Image XObject. It is added to the page using the Do operator.

If you do not understand a word of what I'm saying in this answer, you should start reading ISO-32000-1 or why not start reading the iText documentation?

See for instance: