Improving OCR Accuracy on Preprocessed ROI in Python with Pytesseract

36 Views Asked by At

I'm working on a project where I need to preprocess a region of interest (ROI) on my screen to retrieve text more accurately using Pytesseract. I've managed to capture the screen and define ROIs based on template matching with OpenCV, but I'm struggling to preprocess these ROIs effectively for text recognition.

Here's the main function I'm using to preprocess the images and print the results from pytesseract.image_to_string() :

### images is a dict and the frame is a numpy array containing the frames captured with pyautogui.screenshot() ###
def draw_boxes(images, frame):
    rectangle_color = (0, 255, 0)

    ### SEARCH FOR TEMPLATE ON SCREEN USING THE IMAGES FROM images.items() ###
    for name, img in images.items():

        gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

        result = cv2.matchTemplate(gray_frame, gray_img, cv2.TM_CCOEFF_NORMED)
        min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)

        ### IF TEMPLATE IS FOUND IN SCREEN DEFINE A REGION OF INTEREST ###
        if max_val > 0.5:
            

            ### MAKES A RECTANGLE ON CV2 OUTPUT FOR DEBUG ###
            top_left = max_loc
            bottom_right = (top_left[0] + img.shape[1], top_left[1] + img.shape[0])

            cv2.rectangle(frame, top_left, bottom_right, rectangle_color, 2)


            ### DEFINING THE REGION OF INTEREST ###
            x, y, w, h = max_loc[0], max_loc[1], img.shape[1], img.shape[0]
            roi = frame[y:y+h, x:x+w]

            ### PRE PROCESSING THE REGION OF INTEREST ###
            roi_gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
            blurred_image = cv2.medianBlur(roi_gray, 1, 5)
            denoised_image = cv2.fastNlMeansDenoising(roi_gray, None, 12, 7, 21)
            roi_processed = denoised_image
         

            ### APPLYING BINARY THRESHOLD FOR PYTESSERACT TO READ FROM ###
            #inverted_image = cv2.bitwise_not(roi_processed)
            otsu_threshold_value, thresh_image = cv2.threshold(roi_processed, 1, 255,   cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    

            ### SHOW OCR VIEW FOR DEBUG AND DEVELOPMENT ###
            cv2.imshow('OCR View', thresh_image)
            cv2.waitKey(1)

            custom_config = r'--oem 3 --psm 11'
            text = pytesseract.image_to_string(thresh_image, lang='eng', config=custom_config)

            
    try:
        pass
        print(text)
    except UnboundLocalError:
        print('unbound')

Here's an example of the console output I get from Pytesseract:

Buy Offer

Ectoplasm

Gx

:

A substance from thase passing through the Underworld.

2

2

3379

Buy limit per 4 hours: 25,000

Quantity

Your price per item:

25,000

**bais**

85,375,000 coins

You have bought | so far for 3,415 coins.

125,000

I've noticed that sometimes numbers like "3,415" are misread as "bais" by Pytesseract. Below are images showing the ROI before and after preprocessing. The only thing I can think of is to make the processed image clearer, but can't figure out how.

Here is the ROI without pre processing:

Here is the ROI without pre processing

Here is the ROI with pre processing:

Here is the ROI with pre processing

I'm working on a project where I need to preprocess a region of interest (ROI) on my screen to retrieve text more accurately using Pytesseract. I've managed to capture the screen and define ROIs based on template matching with OpenCV, but I'm struggling to preprocess these ROIs effectively for text recognition.

1

There are 1 best solutions below

0
XENMS ACC On

Okay, after trying out denoising, dilating, sharpening, you name it. I tried resizing the image passed in the:

pytesseract.image_to_string()

I added some blur and it seems to be reading the ROI pretty accurately here is new console output:

Console output image

I feel like this made it much more reliable but I have more testing to do.

To wrap this up. If the text in the image is really small, resize the image or ROI in this case.

Here is how I did it:

        scale_factor = 3
        width = int(roi.shape[1] * scale_factor)
        height = int(roi.shape[0] * scale_factor)
        dim = (width, height)

        resized_roi = cv2.resize(thresh_image, dim, interpolation=cv2.INTER_LINEAR)
        blurred_roi = cv2.medianBlur(resized_roi, 3)

        custom_config = r'--oem 3 --psm 6'
        text = pytesseract.image_to_string(blurred_roi, lang='eng', config=custom_config)

This makes the image as a whole bigger which also makes the text on it bigger. The blur is optional.