I am trying to create a minimal PDF file example using the PDF-2.0 standard based on the ISO Specification. I would like to avoid using the Xref Table and use instead only the Cross-reference stream dictionary and no Trailer Section.
The file opens in Adobe, but when I want to close it, it tries to save it which I consider it does to fix what it is considering a corrupted document structure.
So I guess that my PDF to not comply with the PDF-2.0. But why not?
Here is my code for the PDF-2.0 File:
UPDATE: I tried to follow some of the comments, thanks for these inputs. Update was:
- added Length on the XREF Stream as it seems to be required for a stream (strange in the example in the spec I see examples without Length, but ok added now). I didn't add (yet) page content as it is defined to be optional. The new version of the Pdf opens in Acrobat but when closing the save dialog still appears.
Still don't know what bit is missing to be reconized by Acrobat as valid file and no prompting any saving dialog.
%PDF-2.0
%Óëéá
1 0 obj
<</Type /Catalog
/Pages 2 0 R
/Metadata 5 0 R
>>
endobj
2 0 obj
<</Type /Pages
/Kids [3 0 R 4 0 R]
/Count 2
>>
endobj
3 0 obj
<</Type /Page
/Parent 2 0 R
/MediaBox [0 0 595 842]
>>
endobj
4 0 obj
<</Type /Page
/Parent 2 0 R
/MediaBox [0 0 595 842]
>>
endobj
5 0 obj
<</Type /Metadata
/Subtype /XML
/Length /2880
>>
stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="" xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>PdfProd</pdf:Producer>
</rdf:Description>
<rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:CreateDate>2024-02-28T23:46:34+01:00</xmp:CreateDate>
</rdf:Description>
<rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
</rdf:Description>
<rdf:Description rdf:about="" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>f2015454-8669-45e4-9218-ad61ad0e2082</xmpMM:DocumentID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj
6 0 obj
<</Type /XRef
/Index [0 7]
/Size 7
/W [1 2 1]
/Root 1 0 R
/ID [<1f7139e82f1c048ff020a6c953c3addd><1f7139e82f1c048ff020a6c953c3addd>]
/Length 77
>>
stream
00 0000 00
01 000F 01
01 004F 02
01 008D 03
01 00D3 04
01 0119 05
01 0CAB 06
endstream
endobj
startxref
3405
%%EOF
2ND UPDATE: I tried to implement all suggestions, many thanks for all the very useful and precious inputs in the comments. After these changes, the validation over some online pdf validation, say the file is ok. But it fact for Acrobat now it's even worse, when I try to open the file in Acrobat, is not able to open it anymore ("The file is damaged and could not be repaired."). Thanks in advance for any help!
%PDF-2.0
%Óëéá
1 0 obj
<</Type /Catalog
/Metadata 2 0 R
/Pages 3 0 R
>>
endobj
2 0 obj
<</Type /Metadata
/Length 2881
/Subtype /XML
>>
stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="" xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
<pdf:Producer>Pdf2You</pdf:Producer>
</rdf:Description>
<rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/">
<xmp:CreateDate>2024-03-04T23:42:40+01:00</xmp:CreateDate>
</rdf:Description>
<rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
</rdf:Description>
<rdf:Description rdf:about="" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmpMM:DocumentID>7525a6cc-24c0-4d27-a995-17ee0436f906</xmpMM:DocumentID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj
3 0 obj
<</Type /Pages
/Kids [4 0 R]
/Count 1
>>
endobj
4 0 obj
<</Type /Page
/Parent 3 0 R
/MediaBox [0 0 595 842]
/Resources<<>>
>>
endobj
5 0 obj
<</Type /XRef
/Length 66
/Index [0 6]
/Filter /ASCIIHexDecode
/Size 6
/W [1 2 1]
/Root 1 0 R
/ID [<884dfb9a4ffe1d4accf3d4454478960f><884dfb9a4ffe1d4accf3d4454478960f>]
>>
stream
00 0000 00
01 000F 00
01 004F 00
01 0BE0 00
01 0C18 00
01 0C6D 00
endstream
endobj
startxref
3181
%%EOF
PS: the end of lines are all LF. PPS: the validation tool saying is valid is https://www.pdf-online.com/osa/validate.aspx
I am showing the more common features of your file from 2.0 ISO Standard that will allow acceptance by most, if not all, version 1 or 2 PDF readers. Without them or Acrobat considering any "fix" on entry or exit.
The smallest possible with a "Trailer" and acceptable to Acrobat etc. is roughly 300 bytes (303 with preferred EOL after EOF).
Smallest with XrefStream and not "fixed" or "rejected" by Acrobat Viewer 6 or later is 371 bytes (perhaps 370 if you ignore the %%EOF EOL)!
It does not matter which order each object is numbered and commonly metadata would be 3rd object, if an info section were at the first location. Acrobat will normally "fix" a PDF, by add a duplicate info section first, with entries selected from the metadata. However here for "minimal acceptable" to Acrobat conforming readers, there is no /Info section.
Note the standard says there does not "need" to be a metadata section, so that can be deleted and thus use a smaller example.
Usually the /Type is found as the last object entry while we may logically expect or preferer that first. Comments related to altering your version are at the end.
Alterations
Pages should include a
/Count, even if it is single Page [0].<</Type/Pages/Count 1/Kids[3 0 R]>>A page should infer some contents (even if we declare it is empty).
<</Type/Page/Parent 2 0 R/MediaBox[0 0 612 792]/Contents 4 0 R/Resources<<>>>>A minimal page content can be acceptable as.
Others have suggested an altered "xref" structure. However although the standard "HINTS" of the type you want (an expanded Cross-reference text stream). I have yet to see that documentation format acceptable by Adobe Acrobat. I have no examples, other than it be /FlateDecode encoded.
In the standard this is "hinted" as.
Adobe added an explanation in their 1.7 Reference, same appendix H, which seems to still be current policy.
So commonly readers should/could use either fully flatted or fully inflated and never mix the two, especially when there are edits (incremental additions, alterations etc).
I have used the smaller (in this case 2.0 H2 example) with inflated text version above.
[Later EDIT]
As @mkl has pointed out your Xref table can be replaced without the ASCII /Filter by using a pure binary stream and all readers including Adobe Acrobat Viewers will accept that as an equivalent working format. In effect it meets the "unencoded cross- reference streams" statement.
so replace
With
Where the stream in ASCII terms will be including nulls a more compressed (compared to the decimal text values)
0000000001000F0001004F00010BE000010C1800010C6D00However for pure ANSI editing that would be an unworkable method.
Most readers that allow it to open would simply replace that section as ASCII table for example replace 6 with 7
or convert to flated stream
PDF ISO Standard compliant readers (apart from Acrobat) will also consider this combination as meeting the standard so easier to use uncompressed by ISO 2.0 Compliant Readers however NOT in Acrobat !
The above compressed (5 bytes shown as one) ASCII 120% expanded string is acceptable to most readers (apart from Acrobat DC). Even Acrobat Powered plug-in within Edge will accept it !
Here Acrobat reader in Edge refuses to open the file.
Same File in same EDGE TAB simply switched from IE mode, so using lighter "Powered by Adobe Acrobat" plug-in it works.