Using pytesseract ocr in pythonanywhere for non english languages

638 Views Asked by At

I am creating a website in pythonanywhere for OCR.In this user can upload text-images and download it in editable format. For english language it is working perfectly, but while i try to include some additional languages (south Indian languages) it showing some error messages.

i put my additional traineddata in folder "/home/wiltomalayalamocr/mysite/langfiles" it contains "mal.traineddata" file

and in my code

        pytesseract.pytesseract.tesseract_cmd = r"/usr/bin/tesseract"
        custom_oem_psm_config = '-l {} --psm {} --tessdata-dir "/home/wiltomalayalamocr/mysite/langfiles"'.format(lang,6)
        text = pytesseract.image_to_string(Image.open(filename) , config=custom_oem_psm_config)

in which lang="mal" but i am getting the error

pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v3.04.01 with Leptonica Error opening data file /usr/share/tesseract-ocr/tessdata/mal.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language \'mal\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

i am using python-Flask framework

Anybody can help me ....

1

There are 1 best solutions below

0
On BEST ANSWER

At last searching and trying of 2 day i got the solution for this

setting an environment variable in bash console like below is not enough

$export TESSDATA_PREFIX = /home/wiltomalayalamocr/mysite/langfiles

it will not make effect on our app ,so what we need to do is setting the environment variable up on loading the app . so what i did is same as the below link tells

https://help.pythonanywhere.com/pages/environment-variables-for-web-apps/

my project directory is /home/wiltomalayalamocr/mysite my .env file contains the export TESSDATA_PREFIX=/home/wiltomalayalamocr/mysite/langfiles
And in WSGI configuration file I added folowing line of code

import os
from dotenv import load_dotenv
project_folder = os.path.expanduser('/home/wiltomalayalamocr/mysite')  # adjust as appropriate
load_dotenv(os.path.join(project_folder, '.env'))