When executing this piece of code:
from pypdf import PdfReader,PdfWriter
import traceback
try:
input_pdf = PdfReader(dwnld_filepath)
output_pdf = PdfWriter()
image = input_pdf.pages[0]
output_pdf.add_page(image)
output_pdf.write(file_path)
except Exception as e:
traceback.print_exc()
This is the complete traceback I see:
Traceback (most recent call last): File "/Users/shafeerali/Documents/Nanonets/avanto/API/test.py", line 58, in output_pdf.add_page(image) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_writer.py", line 418, in add_page return self._add_page(page, list.append, excluded_keys) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_writer.py", line 331, in add_page page = cast("PageObject", page_org.clone(self, False, excluded_keys)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/data_structures.py", line 199, in clone d._clone(self, pdf_dest, force_duplicate, ignore_fields, visited) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in clone v.clone(pdf_dest, force_duplicate, ignore_fields) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/data_structures.py", line 199, in clone d._clone(self, pdf_dest, force_duplicate, ignore_fields, visited) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in clone v.clone(pdf_dest, force_duplicate, ignore_fields) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/data_structures.py", line 199, in clone d._clone(self, pdf_dest, force_duplicate, ignore_fields, visited) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 310, in _clone v.clone(pdf_dest, force_duplicate, ignore_fields) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 300, in clone obj.clone(pdf_dest, force_duplicate, ignore_fields) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 116, in clone arr.append(data.clone(pdf_dest, force_duplicate, ignore_fields)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 292, in clone obj = self.get_object() File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_base.py", line 312, in get_object obj = self.pdf.get_object(self) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/_reader.py", line 1401, in get_object retval = read_object(self.stream, self) # type: ignore File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 1280, in read_object return DictionaryObject.read_from_stream(stream, pdf, forced_encoding) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 538, in read_from_stream data["streamdata"] = read_unsized_from_steam(stream, pdf) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 432, in read_unsized_from_steam raise PdfReadError( pypdf.errors.PdfReadError: Unable to find 'endstream' marker for obj starting at 13367.
here the PDF file(s) that cause the issue.
I'm the maintainer of pypdf (and PyPDF2).
The Traceback indicates that your PDF is broken. You can verify that with PDF validators like https://www.pdf-online.com/osa/validate.aspx
Although pypdf can deal with many issues, it will never be able to deal with all kinds of broken PDF documents.
You can repair PDF documents: How can I fix/repair a corrupted PDF file?