Why doesn't pytesseract recognize any text in this image?

503 Views Asked by At

I have this image input image on which I am attempting to apply text detection and ocr, however even after preprocessing (binary thresholding etc) pytesseract doesn't return any output. The purpose of text detection is to improve the ocr output, I'm not too concerned with obtaining bounding boxes.

Here is my code below:

image = cv2.imread('image.jpg')

grey = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

ret,thresh1 = cv2.threshold(grey,127,255,cv2.THRESH_BINARY)

image = pytesseract.image_to_data(thresh1, output_type=Output.DICT)
image = cv2.bitwise_not(image)

Inspecting the results there is none to nonsensical output, is there anyway to improve this?

1

There are 1 best solutions below

1
On

Try this code:

import pytesseract
import cv2
image = cv2.imread('ccl6t.png')
pytesseract.pytesseract.tesseract_cmd = r'k:\Tesseract\tesseract.exe' #need change!
grey = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret,thresh1 = cv2.threshold(grey,127,255,cv2.THRESH_BINARY_INV)
cv2.imwrite('tresh.png', thresh1)
words = pytesseract.image_to_data(thresh1, lang='eng',config='--psm 3 --oem 1 ')
print(str(words))