I'm trying to make a text recognition program and caught this error.
here's the code:
from PIL import Image
import pytesseract
print("Enter File/Folder's Full Path")
file = input('> ')
print("Enter output Folder")
outfol = input('> ')
print() print('Converting...')
print()
os.system('cd /d ' + outfol)
pdf = pytesseract.image_to_pdf_or_hocr(file, extension='pdf', config='--psm 1')
with open('test.pdf', 'w+b') as f:
f.write(pdf) # pdf type is bytes by default print()
here's the error on the console:
Traceback (most recent call last):
File "e:\Desktop\imgrec.py", line 133, in <module>
File "e:\Desktop\imgrec.py", line 86, in mainmenu
OCRmenu()
File "e:\Desktop\imgrec.py", line 118, in OCRmenu
pdf = pytesseract.image_to_pdf_or_hocr(file, extension='pdf', config='--psm 1')
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 446, in image_to_pdf_or_hocrreturn
run_and_get_output(*args)
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 288, in run_and_get_outputrun_tesseract (**kwargs)
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 264, in run_tesseractraise TesseractError(proc.returncode, get_errors(error_string))
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 155, in get_errorsline for line in error_string.decode(DEFAULT_ENCODING).splitlines()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 109: invalid start byte
i've tried adding encoding='utf-8' in pytesseract.image_to_pdf_or_hocr
it returned : TypeError: image_to_pdf_or_hocr() got an unexpected keyword argument 'encoding'
btw I'm a new learner