I must rename a number of PDF files with the serial numbers of laptops every day as part of my routine. I need somekind of a script/software that opens a PDF file, copies a certain line (service tag) from it, then closes and renames the PDF with the copied serial number from it. 

I tried to use the PyPDF2 package in Python, but my code is  not working.

from PyPDF2 import PdfReader


reader = PdfReader("SNPDF.pdf")

page = reader.pages[0]
number_of_pages = len(reader.pages)

text = page.extract_text()



for x in text:
    if x.startswith("SN"):
        new_name = x.replace("SN", '')
      


print(new_name)

It worked with .txt file, but I have no clue how to run it with PDF.

import os

serial = open('text.txt')
for line in serial:
    if line.startswith("SN"):
        new_line = line.replace('SN','')

serial.close()

os.rename('text.txt', new_line)


3

There are 3 best solutions below

0
Anton Petrov On

The following code should extract the service tag from the first page of the PDF file and rename the file with the extracted service tag.

import os

from PyPDF2 import PdfReader

FILE_TO_RENAME = "SNPDF.pdf"
SERVICE_TAG_PREFIX = "SN"

with open(FILE_TO_RENAME, "rb") as f:
    reader = PdfReader(f)
    # Replace "0" with the index of the page that contains the service tag
    page = reader.pages[0]
    text = page.extract_text()

# Search for the service tag in the extracted text
for line in text.split("\n"):
    if line.startswith(SERVICE_TAG_PREFIX):
        service_tag = line.partition(SERVICE_TAG_PREFIX)[2].strip()
        break
else:
    raise Exception(f"No service tag prefix found: {SERVICE_TAG_PREFIX}")

# Rename the PDF file with the service tag
os.rename(FILE_TO_RENAME, service_tag + ".pdf")
1
911 On

I think PyMuPDF package is better,you need to import the "fitz" package to use it,

pdf_file = "SNPDF.pdf"
doc = fitz.open(pdf_file)
for page in doc:
    text = page.get_text()
    if text.startswith("SN"):
        new_name = text.replace("SN", '')
doc.close()
0
Jorj McKie On
import fitz  # import PyMuPDF package

pdfnames = [...]  # list of pdf pathnames

for f in pdfnames:
    doc = fitz.open(f)
    page = doc[0]  # load first page
    words = page.get_text("words")  # extract text separated as strings w/o spaces
    SN = None
    for i, word in enumerate(words):
        if word[4] == "SN"  # search for SN identifier
            SN = words[i+1][4]  # next word string should be serial number
            break
    if SN == None:
        print(f"no SN in file '{f}'.")
        continue
    doc.close()  # close the file for renaming it
    os.rename(f, SN + ".pdf")