Consider https://github.com/modesty/pdf2json/files/13788866/test.pdf .
When I do pdf2json -f test.pdf
I get as the following as the coordinates / dimensions of the "yyy" field:
"x": 11.924,
"y": 7.027,
"w": 9.375,
"h": 1.292
But when I do qpdf test.pdf --qdf test.qdf
I get this:
/Rect [
190.784
658.903
340.784
680.903
]
My question is... why aren't they the same and how do I convert one to the other and vice versa?
I note the page sizes are different as well. qdf
gives this as the size:
%% Page 1
%% Original object ID: 18 0
21 0 obj
<<
/Annots 6 0 R
/Contents 22 0 R
/CropBox [
0.0
0.0
612.0
792.0
]
/MediaBox [
0.0
0.0
612.0
792.0
]
/Parent 15 0 R
/Resources <<
>>
/Rotate 0
/Type /Page
>>
endobj
So 612x792 (which is consistent with what pdfbox -box test.pdf
tells me) whereas pdf2json
gives 38.25x49.5. Now in the case of the page size I note that you can transform one to the other by either multiplying by 16 or dividing by 16. But for the position and dimensions of the PDF field that is not the case.
So like presumably 658.903 in the qpdf output corresponds to the 11.924 (x) in the pdf2json output. 658.903/11.924 is about 55.25. Likewise it stands to reason that 190.784 corresponds to the 7.027 (y) in the pdf2json output but 190.784/7.027 gives me 27.15. So that means that there's not some constant multiplier that I can use to transform one set of coordinates to another.
For good measure I also tried 680.903/11.924 (57.10) and 340.784/7.027 (48.49) and those don't match either.
So how do pdf2json's coordinates / dimensions relate to the numbers in /Rect
? Do they relate at all?
The source PDF
/Type/Page
has a size of/CropBox[0 0 612 792]
and a linked/Type/Annot
of Dimensions/Rect[190.784 658.903 340.784 680.903]
Thus a field box of 150 units wide (not all shown here) by a relative height of 22 units and without transformations can be considered as simple point sizes @ 1/72" per unit.The JSON interpretation is calculated at this point in time for whatever reason on 1/16ths (4.5/72). (it can be different on different devices and pages).
https://github.com/search?q=repo%3Amodesty%2Fpdf2json+units&type=issues
Thus to convert the JSON HTML units as you describe just multiply by 16!