UnicodeDecodeError occured using tesseract OCR in python 3.1

14 Views Asked by At

I'm trying to make a text recognition program and caught this error.

here's the code:

from PIL import Image
import pytesseract

print("Enter File/Folder's Full Path")
file = input('> ')
print("Enter output Folder") 
outfol = input('> ')  
print() print('Converting...') 
print()  
os.system('cd /d ' + outfol) 
pdf = pytesseract.image_to_pdf_or_hocr(file, extension='pdf', config='--psm 1') 
with open('test.pdf', 'w+b') as f:     
    f.write(pdf) # pdf type is bytes by default  print()

here's the error on the console:

Traceback (most recent call last):
  File "e:\Desktop\imgrec.py", line 133, in <module>
  File "e:\Desktop\imgrec.py", line 86, in mainmenu
    OCRmenu()
  File "e:\Desktop\imgrec.py", line 118, in OCRmenu
    pdf = pytesseract.image_to_pdf_or_hocr(file, extension='pdf', config='--psm 1')
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 446, in    image_to_pdf_or_hocrreturn
    run_and_get_output(*args)
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 288, in run_and_get_outputrun_tesseract (**kwargs)
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 264, in  run_tesseractraise TesseractError(proc.returncode, get_errors(error_string))
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 155, in get_errorsline for line in error_string.decode(DEFAULT_ENCODING).splitlines()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 109: invalid start byte

i've tried adding encoding='utf-8' in pytesseract.image_to_pdf_or_hocr

it returned : TypeError: image_to_pdf_or_hocr() got an unexpected keyword argument 'encoding'

btw I'm a new learner

0

There are 0 best solutions below