Detecting handwritten lines using OpenCV

394 Views Asked by At

I'm trying to detect underlines and boxes in the following image:

enter image description here

For example, this is the output I'm aiming for:

enter image description here

Here's what I've atempted:

import cv2
import numpy as np

# Load image
img = cv2.imread('document.jpg')

# Convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Apply Gaussian blur to reduce noise
blur = cv2.GaussianBlur(gray, (3, 3), 0)

# Threshold the image
ret, thresh = cv2.threshold(blur, 127, 255, cv2.THRESH_TRUNC)

# Apply Canny Edge Detection
edges = cv2.Canny(thresh, 155, 200)

# Use HoughLinesP to detect lines
lines = cv2.HoughLinesP(edges, rho=1, theta=1*np.pi/180, threshold=100, minLineLength=100, maxLineGap=50)

# Draw lines on image
for line in lines:
    x1, y1, x2, y2 = line[0]
    cv2.line(img, (x1, y1), (x2, y2), (0, 0, 255), 4)

However, this is the result I get:

enter image description here

Here are my thoughts regarding this problem:

  1. I might need to use adaptive thresholding with Otsu's algorithm to provide a proper binary image to cv2.Canny(). However, I doubt this is the core issue. Here is how the image looks with the current thresholding applied:

enter image description here

cv2.threshold() already does a decent job separating the notes from the page.

  1. Once I get HoughLinesP() to properly draw all the lines (and not the weird scribbles it's currently outputting), I can write some sort of box detector function to find the boxes based on the intersections (or near-intersections) of four lines. As for underlines, I simply need to detect horizontal lines from the output of HoughLinesP(), which shouldn't be difficult (e.g., for any given line, check if the two y coordinates are within some range of each other).

So the fundamental problem I have is this: how do I get HoughLinesP() to output smoother lines and not the current mess it's giving so I can then move forward with detecting boxes and lines?

Additionally, do my proposed methods for finding boxes and underlines make sense from an efficiency standpoint? Does OpenCV provide a better way for achieving what I want to accomplish?

0

There are 0 best solutions below