Extract comments and reference text from PDF using Python

78 Views Asked by At

I want to get the comments as well as its reference text from PDF. This is the simple code to get the comments.

import fitz  # PyMuPDF

def extract_comments_from_pdf(file_path):
    document = fitz.open(file_path)
    comments = []

    for page_num in range(len(document)):
        page = document.load_page(page_num)
        annotations = page.annots()

        if annotations:
            for annot in annotations:
                # Check if the annotation is a text comment
                if annot.type[0] == 8:  # 8 is the code for text annotations
                    comment = annot.info["content"]
                    comments.append({
                        "page": page_num + 1,
                        "comment": comment
                    })

    document.close()
    return comments

I don't have any idea how to do it.

0

There are 0 best solutions below