Handwritten cross inside a square image recognition

66 Views Asked by At

this is a long one. I am currently (from way too much time) trying to differentiate between crossed out square, completely blacked out square, blank square, crossed out/blacked out circle and blank circle gather from a scanned image like the one you can see. My current approach (link to the repo here, the focus is on the function detect_answers in evaluator.py, sorry for the occasional italian comments/names) is:

  1. Find the outer black borders;
  2. Align the image to compensate for scanning misalignment;
  3. Retrieve ID barcode;
  4. Divide the whole image in small square such that each one contains either a circle or a square;
  5. Classify each square based on the category mentioned above (crossed out square in green, completely blacked out square in blue, blank square can be left unprocessed, crossed out/blacked out circle in red and blank circle).
    def detect_answers(bgr_image: np.array, bgr_img_for_debug: np.array,
                   x_cut_positions: List[int], y_cut_positions: Tuple[int],
                   is_60_question_sim, debug: str):
    question_multiplier: int = 15 if is_60_question_sim else 20

    letter: Tuple[str, ...] = ("L", "", "A", "B", "C", "D", "E")
    user_answer_dict: Dict[int, str] = {i: "" for i in range(1, 61 - 20 * int(not is_60_question_sim))}

    gr_image: np.array = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2GRAY)

    # load SVM model
    load_path = os.getcwd()
    clf = load(os.path.join(load_path, "reduced.joblib"))

    for y_index in range(len(y_cut_positions) - 1):
        for x_index in range(len(x_cut_positions) - 1):

            # if you are on a column with only numbers, skip it
            if not (x_index - 1) % 7:
                continue

            x_top_left = int(not x_index % 7) * 7 + x_cut_positions[x_index]
            x_bottom_right = int(not x_index % 7) * 2 + x_cut_positions[x_index + 1]

            y_top_left: int = y_cut_positions[y_index]
            y_bottom_right: int = y_cut_positions[y_index + 1]

            crop_for_prediction: np.array = gr_image[y_top_left:y_bottom_right, x_top_left:x_bottom_right]
            crop_for_prediction: np.array = cv2.resize(crop_for_prediction, (18, 18))

            # category = ("QB", "QS", "QA", "CB", "CA")
            #               0     1     2     3     4

            crop_for_prediction: np.array = np.append(crop_for_prediction,
                                                      [x_index % 7, int(np.mean(crop_for_prediction))])
            predicted_category_index: int = clf.predict([crop_for_prediction])[0]
    return user_answer_dict

I am expecting to have a list with the position of each relevant coloured square and its category. Currently this is done via a trained model used previous manually labeled data (file can be found in the github repo). I have tried many different approach (mean of each square, counting black pixels, ...) but no method seems to be reliable enough to handle all the variation that handwriting provides. Furthermore, the algorithm should be quite fast since each time it runs it needs to evaluate 500-800 tests.

Sample input

Sample input

Expected output

Expected output

Correct Classification, ignore red numbers

Correct Classification, ignore red numbers

WrongClassification

WrongClassification In particular, question 1D should be rounded by red contours (since it's a blacked out square) and almost everything from the last column didn't get classified correctly.

At your disposal for any question, clarification or discussion. Cheers

0

There are 0 best solutions below