text is being changed when i do copy it from searchable pdf file (created with tesseract command) and paste it in notepad

192 Views Asked by Muhammad Moinuddin At 27 July 2025 at 12:55

I have created a searchable pdf file by running following command on one of my images.

tesseract page.jpg test pdf --oem 1 --psm 5 -l urd

this the image which I have converted to searchable pdf.

the image contains Urdu text, but when I am copying it from newly created pdf file and pasting it in any other text editor, this is what I am getting.

GehbFie”

any tesseract OCR and encoding expert here who can solve my issue please, any help will be highly appreciated, thanks in advance.

Original Q&A

There are 1 best solutions below

Muhammad Moinuddin On 16 October 2018 at 15:40 BEST ANSWER

pdf is the config file name. it needs to come last in the command, after --oem --psm -l etc.

the correct format for the command is following.

tesseract page.jpg test --oem 1 --psm 5 -l urd pdf

I resolved my issue in this way.

text is being changed when i do copy it from searchable pdf file (created with tesseract command) and paste it in notepad

There are 1 best solutions below

Related Questions in PDF

Related Questions in CMD

Related Questions in OCR

Related Questions in TESSERACT

Related Questions in URDU

Trending Questions

Popular # Hahtags

Popular Questions