I have the following image that I want to get the information from the table contained in it. I managed to get the information from the first and third columns. However, I cannot get pytesseract to work with the second column.
Here is my code:
from PIL import Image, ImageDraw, ImageFilter
import pytesseract
im = Image.open(image_address)
# First Column, WORKING
box_1 = (100, 435, 800, 490)
a = im.crop(box_1)
pytesseract.image_to_string(a)
# Second Column, NOT WORKING
box_2 = (810, 445, 1200, 490)
a = im.crop(box_2)
pytesseract.image_to_string(a)
I tried to remove the gray background, but it did not work
#Remove gray background, NOT WORKING
gray = a.convert('L')
bw_a = gray.point(lambda x: 0 if x<128 else 255, '1')
pytesseract.image_to_string(bw_a)
I also tried dilation, and it did not work
## Dilation
filter_a= bw_a.filter(ImageFilter.MinFilter(3))
pytesseract.image_to_string(filter_a)
However, if I go to the third column, it is working
# Third Column, WORKING
box_3 = (1230, 445, 1500, 490)
a = im.crop(box_3)
pytesseract.image_to_string(a)
Any thoughts?