Pdfium library : Get the comments on page

42 Views Asked by At

How to get comment made on pdf page using acrobat reader with pdfium. Got 2 annotations from FPDFPage_GetAnnot method like HIGHLIGHT,POPUP But not able to figure out the comments text.

[
    {
        "rect": {
            "left": 69.9908,
            "top": 810.952,
            "right": 139.376,
            "bottom": 795.96
        },
        "sub_type": "HIGHLIGHT",
        "objects": [
            {
                "type": "FORM",
                "rect": {
                    "left": 0.4410019,
                    "top": 14.550964,
                    "right": 68.9445,
                    "bottom": 0.44110107
                },
                "stroke_color": {
                    "r": 0,
                    "g": 0,
                    "b": 0,
                    "a": 29
                },
                "fill_color": {
                    "r": 0,
                    "g": 0,
                    "b": 0,
                    "a": 29
                },
                "stroke_width": 1,
            }
        ]
    },
    {
        "rect": {
            "left": 532.14795,
            "top": 810.47205,
            "right": 716.73004,
            "bottom": 690.03503
        },
        "sub_type": "POPUP",
        "color": {
            "r": 0,
            "g": 0,
            "b": 0,
            "a": 255
        },
        "has_attachment_points": false,
        "flags": "NONE",
    }
]

enter image description here

Tried with multiple functions from pdfium library.

2

There are 2 best solutions below

0
OM PRAKASH SEERVI On BEST ANSWER

@K J your answer was so much helpful to understand the underlying pdf obj and annotations. With respect to Pdfium library able extract all the comments and State.

using Pdfium first got the linked Annotation, following is in go-lang.

getLinkedAnnot, _ := instance.FPDFAnnot_GetLinkedAnnot(&requests.FPDFAnnot_GetLinkedAnnot{Annotation: annotation, Key: "Popup"})
 if getLinkedAnnot != nil {
  getLinkedAnnot.LinkedAnnotation   
 }      

Then also getting the values of all key

keys := []string{"Contents", "CreationDate", "Subj", "State", "RC", "T"}
    var params []*common.Param
    for i := 0; i < len(keys); i++ {
        key := keys[i]
        hasKey, err := instance.FPDFAnnot_HasKey(&requests.FPDFAnnot_HasKey{Annotation: annotation, Key: key})
        if err != nil {
            return nil, err
        }
        if hasKey.HasKey {
            var param = &common.Param{Key: key}
            valueType, err := instance.FPDFAnnot_GetValueType(&requests.FPDFAnnot_GetValueType{Annotation: annotation, Key: key})
            if err != nil {
                return nil, err
            }
            param.Type = common.GetType(valueType.ValueType)
            if valueType.ValueType == enums.FPDF_OBJECT_TYPE_STRING {
                value, err := instance.FPDFAnnot_GetStringValue(&requests.FPDFAnnot_GetStringValue{Annotation: annotation, Key: key})
                if err != nil {
                    return nil, err
                }
                param.Value = value.Value
            } else if valueType.ValueType == enums.FPDF_OBJECT_TYPE_NUMBER {
                value, err := instance.FPDFAnnot_GetNumberValue(&requests.FPDFAnnot_GetNumberValue{Annotation: annotation, Key: key})
                if err != nil {
                    return nil, err
                }
                param.Value = fmt.Sprintf("%f", value.Value)
            }
            params = append(params, param)
        }
    }

which gives

"params": [
                        {
                            "key": "Contents",
                            "type": "STRING",
                            "value": "Accepted set by omprakashseervi"
                        },
                        {
                            "key": "CreationDate",
                            "type": "STRING",
                            "value": "D:20240320170027+05'30'"
                        },
                        {
                            "key": "Subj",
                            "type": "STRING",
                            "value": "Sticky Note"
                        },
                        {
                            "key": "State",
                            "type": "STRING",
                            "value": "Accepted"
                        },
                        {
                            "key": "RC",
                            "type": "STRING",
                            "value": "<?xml version=\"1.0\"?><body xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:xfa=\"http://www.xfa.org/schema/xfa-data/1.0/\" xfa:APIVersion=\"Acrobat:23.8.0\" xfa:spec=\"2.0.2\" ><p>Accepted set by omprakashseervi</p></body>"
                        },
                        {
                            "key": "T",
                            "type": "STRING",
                            "value": "omprakashseervi"
                        }
                    ]
0
K J On

You need to provide a minimal sample (here in emulation the comments are attached to the highlight) as notes are not related to the text, simply an area. Also perhaps more important, there is no direct mention of which font is to be used as annots are down to text display by a readers available fonts and thus ability to display a local language.

Hence your output should only be a rectangle, which is the common case with most annotations.

enter image description here

JSON is often an expanded subset of the real PDF data so in my emulation there is much more of your yet unseen contents. However much of this is duplication that could be deleted without problems. The link between all the sub-components is the NAME /NM(836487c4-8096-48bf-93db-c63e01ffa5c6)

Here seen from the PDF as objects 21-28 inclusive (Note the number as on page [0] is not identified directly.)

21 0 obj <</AP<</N 34 0 R>>/C[1 .666656 .75]/CA .399994/Contents(Here is the comment.)/CreationDate(D:20240320141805Z)/F 4/M(D:20240320142516Z)/NM(836487c4-8096-48bf-93db-c63e01ffa5c6)/P 8 0 R/Popup 22 0 R/QuadPoints[426.721 833.962 499.115 834.264 426.776 820.849 499.17 821.152]/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:24.1.0" xfa:spec="2.0.2" ><p dir="ltr"><span dir="ltr" style="font-size:9.9pt;text-align:left;font-weight:normal;font-style:normal">Here is the comment.</span></p></body>)/Rect[426.309 820.438 499.582 834.676]/Subj(Highlight)/Subtype/Highlight/T(lez)/Type/Annot>> endobj
22 0 obj <</F 28/Open false/Parent 21 0 R/Rect[809.26 651.747 951.011 749.247]/Subtype/Popup/Type/Annot>> endobj
23 0 obj <</AP<</N 32 0 R/R 33 0 R>>/C[1 .819611 0]/Contents(Unmarked set by lez)/CreationDate(D:20240320141842Z)/F 30/IRT 21 0 R/M(D:20240320142256Z)/NM(343deb33-8149-4a18-b624-3b986c759c9a)/Name/Comment/P 8 0 R/Popup 24 0 R/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:24.1.0" xfa:spec="2.0.2" ><p>Unmarked set by lez</p></body>)/Rect[100 76 124 100]/State(Unmarked)/StateModel(Marked)/Subj(Sticky Note)/Subtype/Text/T(lez)/Type/Annot>> endobj
24 0 obj <</F 30/Open false/Parent 23 0 R/Rect[1023.75 -14 1227.75 100]/Subtype/Popup/Type/Annot>> endobj
25 0 obj <</AP<</N 32 0 R/R 33 0 R>>/C[1 .819611 0]/Contents(Accepted set by lez)/CreationDate(D:20240320141950Z)/F 30/IRT 23 0 R/M(D:20240320141950Z)/NM(ab49acd6-9940-4af6-98a5-031ad272ef0c)/Name/Comment/P 8 0 R/Popup 26 0 R/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:24.1.0" xfa:spec="2.0.2" ><p>Accepted set by lez</p></body>)/Rect[100 76 124 100]/State(Accepted)/StateModel(Review)/Subj(Sticky Note)/Subtype/Text/T(lez)/Type/Annot>> endobj
26 0 obj <</F 28/Open false/Parent 25 0 R/Rect[1023.75 -14 1227.75 100]/Subtype/Popup/Type/Annot>> endobj
27 0 obj <</AP<</N 32 0 R/R 33 0 R>>/C[1 .819611 0]/Contents(Completed set by lez)/CreationDate(D:20240320141957Z)/F 30/IRT 25 0 R/M(D:20240320141957Z)/NM(72877f21-e6e3-4f26-a8f6-222ee321b01a)/Name/Comment/P 8 0 R/Popup 28 0 R/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:24.1.0" xfa:spec="2.0.2" ><p>Completed set by lez</p></body>)/Rect[100 76 124 100]/State(Completed)/StateModel(Review)/Subj(Sticky Note)/Subtype/Text/T(lez)/Type/Annot>> endobj
28 0 obj <</F 28/Open false/Parent 27 0 R/Rect[1023.75 -14 1227.75 100]/Subtype/Popup/Type/Annot>> endobj

To get all the data for a comment use export in Acrobat Reader.

enter image description here

The FDF file will have all the data for the one group here as 1 carrier + 8 related objects. Also note we can now identify the page number is identified for that file as /Page 0 (which in human numbering terms is Page 1). Again there is no conveyance of textual data such as "TECHFACT" nor any font names, (simply F28 and F30).

%FDF-1.2
%âãÏÓ
1 0 obj
<</FDF<</Annots[2 0 R 3 0 R 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R]/F(/C/Users/lez/Downloads/demo/zrYRvg.pdf)/ID[<9FE283F5C5B975270B43C6C933DB9719><E8B214D3EA04131074E692BD65AED408>]/UF(/C/Users/lez/Downloads/demo/zrYRvg.pdf)>>/Type/Catalog>>
endobj
2 0 obj
<</C[1 0.666656 0.75]/CA 0.399994/Contents(Here is the comment.)/CreationDate(D:20240320141805Z)/F 4/M(D:20240320142516Z)/NM(836487c4-8096-48bf-93db-c63e01ffa5c6)/Page 0/Popup 3 0 R/QuadPoints[426.721 833.962 499.115 834.264 426.776 820.849 499.17 821.152]/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:24.1.0" xfa:spec="2.0.2" ><p dir="ltr"><span dir="ltr" style="font-size:9.9pt;text-align:left;font-weight:normal;fo\
nt-style:normal">Here is the comment.</span></p></body>)/Rect[426.309 820.438 499.582 834.676]/Subj(Highlight)/Subtype/Highlight/T(lez)/Type/Annot>>
endobj
3 0 obj
<</F 28/Open false/Page 0/Parent 2 0 R/Rect[809.26 651.747 951.011 749.247]/Subtype/Popup/Type/Annot>>
endobj
4 0 obj
<</C[1 0.819611 0]/Contents(Unmarked set by lez)/CreationDate(D:20240320141842Z)/F 30/IRT(836487c4-8096-48bf-93db-c63e01ffa5c6)/M(D:20240320142256Z)/NM(343deb33-8149-4a18-b624-3b986c759c9a)/Name/Comment/Page 0/Popup 5 0 R/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:24.1.0" xfa:spec="2.0.2" ><p>Unmarked set by lez</p></body>)/Rect[100 76 124 100]/State(Unmarked)/StateModel(Marked)/Subj(Sticky Note)/Subtype/Text/T(lez)/Type/Annot>>
endobj
5 0 obj
<</F 30/Open false/Page 0/Parent 4 0 R/Rect[1023.75 -14 1227.75 100]/Subtype/Popup/Type/Annot>>
endobj
6 0 obj
<</C[1 0.819611 0]/Contents(Accepted set by lez)/CreationDate(D:20240320141950Z)/F 30/IRT(343deb33-8149-4a18-b624-3b986c759c9a)/M(D:20240320141950Z)/NM(ab49acd6-9940-4af6-98a5-031ad272ef0c)/Name/Comment/Page 0/Popup 7 0 R/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:24.1.0" xfa:spec="2.0.2" ><p>Accepted set by lez</p></body>)/Rect[100 76 124 100]/State(Accepted)/StateModel(Review)/Subj(Sticky Note)/Subtype/Text/T(lez)/Type/Annot>>
endobj
7 0 obj
<</F 28/Open false/Page 0/Parent 6 0 R/Rect[1023.75 -14 1227.75 100]/Subtype/Popup/Type/Annot>>
endobj
8 0 obj
<</C[1 0.819611 0]/Contents(Completed set by lez)/CreationDate(D:20240320141957Z)/F 30/IRT(ab49acd6-9940-4af6-98a5-031ad272ef0c)/M(D:20240320141957Z)/NM(72877f21-e6e3-4f26-a8f6-222ee321b01a)/Name/Comment/Page 0/Popup 9 0 R/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:24.1.0" xfa:spec="2.0.2" ><p>Completed set by lez</p></body>)/Rect[100 76 124 100]/State(Completed)/StateModel(Review)/Subj(Sticky Note)/Subtype/Text/T(lez)/Type/Annot>>
endobj
9 0 obj
<</F 28/Open false/Page 0/Parent 8 0 R/Rect[1023.75 -14 1227.75 100]/Subtype/Popup/Type/Annot>>
endobj
trailer
<</Root 1 0 R>>
%%EOF

And to show the comments are un-related to a page position nor text related. Here the self same comment, is pointing into the voids of the space in a blank page, that has nothing to show. The comments are independent of page or page contents, simply a rectangle in the voids of a page number(+1).

enter image description here