Edge detection with GIMP/Sobel vs. OpenCV/Sobel

87 Views Asked by At

Using the C++ API of OpenCV version 4.9.0 on Linux Ubuntu 22.04, I am trying to automate detection of the multiple store receipts depicted in this first scanned image, copy out the resulting subimages, rotate them and feed them to an OCR routine for further processing:

receipts_original_scan.png

The green areas were not in the original but serve to mask any PII which might be present in the receipts.

I have tried using the code from various examples found in the OpenCV documentation and here on StackOverflow, but I had a hard time getting it to work properly until I tried doing the image pre-processing step in the GIMP. Here, after some experimentation, I found the "Filter->Edge-Detect->Edge..." option with the Sobel algorithm to be the most successful for generating an image I can feed to the C++ code I have written (code is further below).

Here is the GIMP dialog with the settings I used (algorithm=Sobel, amount=10, border behavior="clamp"):

receipts_gimp_edge_dialog.png

The resulting image:

receipts_gimp_sobel_10_replace.png

Here is the binary image after applying a threshold of 35:

receipts_after_threshold_35.png

And finally, the OpenCV code generated these rotated rectangles around the receipts (rectangles which were too small or too large are discarded):

receipts_rects_found.png

So here is my question: How do I get OpenCV to do much the same with my original image as the GIMP can do?

Here is my code, adapted from OpenCV examples as well as this SO thread:

#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>

using namespace cv;
using namespace std;

int main(int argc, char* argv[]) {

  int min_area = 10000;
  int max_height = 800;
  double thr = 220;

  Mat img, orig_img;
  
  if (argc < 2) {
    cout << "Usage: " << argv[0] << " [FILE PATH] [THRESHOLD (1-255, optional)]" << endl;
    return 0;
  } else {
    orig_img = imread(argv[1]);
    if ( orig_img.empty() ) {
      cout << "WARNING: the input image was empty!" << endl;
      return EXIT_FAILURE;
    }
    img = orig_img;
  }
  
  if (argc > 2) {
    long val = strtol(argv[2], nullptr, 10);
    if (val > 0 && val < 256) {
      thr = val;
    }
  }

  //-------------------------------------------
  // Resize if height is larger than 800 pixels:
  Mat tmp;
  double h = img.size().height;
  double w = img.size().width;
  double df = 1.0;
  cout << "\nOriginal image height:\t" << static_cast<int>(h) << endl;
  cout << "Original image width:\t" << static_cast<int>(w) << "\n" << endl;
  if (h > max_height) {
    df = (double)max_height / h;
    resize(img, tmp, Size(), df, df, INTER_NEAREST_EXACT);
    img = tmp;
  }

  Mat img2, img3;
  cvtColor(img, img2, COLOR_BGR2GRAY);
  blur( img2, img3, Size(3,3) );
  threshold(img3, img2, thr, 255, THRESH_BINARY);
  
  Mat element = getStructuringElement(MORPH_CROSS, Size(3, 3), Point(1, 1));
  erode(img2, img2, element); // without it find contours fails on some rects
  
  // Show images:
  imshow("img", img);
  imshow("img2", img2);
  waitKey();
  
  // preprocessing done, search rectangles
  vector<vector<Point> > contours;
  
  // vector<Vec4i> hierarchy;
  findContours(img2, contours, /* hierarchy, */ RETR_EXTERNAL, CHAIN_APPROX_SIMPLE);
  
  vector<RotatedRect> rects;
  for (int i = 0; i < contours.size(); i++) {
    // if (hierarchy[i][2] > 0) continue;
  
    // capture inner contour
    RotatedRect rr = minAreaRect(contours[i]);
    if (rr.size.area() < min_area) continue; // too small
  
    rr.size.width += 8;
    rr.size.height += 8; // expand to outlier rect if needed
    rects.push_back(rr);
    
    cout << "***************\nRectangle dimensions:"
         << "\nRect " << i << ":"
         << "\n\twidth:\t" << rr.size.width
         << "\n\theight:\t" << rr.size.height
         << "\n\tarea:\t" << rr.size.width * rr.size.height
         << "\n***************\n"
         << endl;
  }
  
  Mat debugImg;
  img.copyTo(debugImg);

  for (RotatedRect rr : rects) {
    Point2f points[4];
    rr.points(points);
    for (int i = 0; i < 4; i++) {
      int ii = (i + 1) % 4;
      line(debugImg, points[i], points[ii], CV_RGB(255, 0, 0), 2);
    }
  }
  imshow("debug", debugImg);
  waitKey();
}

Thanks for helping!

EDIT:

In the meantime, I was able to get better results by using the Scharr algorithm in OpenCV as well as playing with the parameters. My goal is to make this as automated a process as possible, so I would want minimal input from the user.

Here is what I have so far (adapted from the OpenCV Sobel example found here):

#include "opencv2/imgproc.hpp"
#include "opencv2/imgcodecs.hpp"
#include "opencv2/highgui.hpp"
#include <iostream>
using namespace cv;
using namespace std;
int main( int argc, char** argv )
{
  cv::CommandLineParser parser(argc, argv,
                               "{@input   |lena.jpg|input image}"
                               "{ksize   k|1|ksize (hit 'K' to increase its value at run time)}"
                               "{scale   s|1|scale (hit 'S' to increase its value at run time)}"
                               "{delta   d|0|delta (hit 'D' to increase its value at run time)}"
                               "{help    h|false|show help message}");
  cout << "The sample uses Sobel or Scharr OpenCV functions for edge detection\n\n";
  parser.printMessage();
  cout << "\nPress 'ESC' to exit program.\nPress 'R' to reset values ( ksize will be -1 equal to Scharr function )";
  // First we declare the variables we are going to use
  Mat image,src, src_gray;
  Mat grad;
  const String window_name = "Sobel Demo - Simple Edge Detector";
  int ksize = parser.get<int>("ksize");
  int scale = parser.get<int>("scale");
  int delta = parser.get<int>("delta");
  int ddepth = CV_16S;
  String imageName = parser.get<String>("@input");
  // As usual we load our source image (src)
  image = imread( samples::findFile( imageName ), IMREAD_COLOR ); // Load an image
  // Check if image is loaded fine
  if( image.empty() )
  {
    printf("Error opening image: %s\n", imageName.c_str());
    return EXIT_FAILURE;
  }

  //-------------------------------------------
  // Resize if height is larger than 800 pixels:
  int max_height = 800;
  Mat tmp;
  double h = image.size().height;
  double w = image.size().width;
  double df = 1.0;
  cout << "\nOriginal image height:\t" << static_cast<int>(h) << endl;
  cout << "Original image width:\t" << static_cast<int>(w) << "\n" << endl;
  if (h > max_height) {
    df = (double)max_height / h;
    resize(image, tmp, Size(), df, df, INTER_NEAREST_EXACT);
    image = tmp;
  }

  double thr = 125; // Start with a threshold value somewhere between 0 and 255

  for (;;)
  {
    // Remove noise by blurring with a Gaussian filter ( kernel size = 3 )
    GaussianBlur(image, src, Size(3, 3), 0, 0, BORDER_DEFAULT);
    // Convert the image to grayscale
    cvtColor(src, src_gray, COLOR_BGR2GRAY);
    Mat grad_x, grad_y;
    Mat abs_grad_x, abs_grad_y;
    Sobel(src_gray, grad_x, ddepth, 1, 0, ksize, scale, delta, BORDER_DEFAULT);
    Sobel(src_gray, grad_y, ddepth, 0, 1, ksize, scale, delta, BORDER_DEFAULT);
    // converting back to CV_8U
    convertScaleAbs(grad_x, abs_grad_x);
    convertScaleAbs(grad_y, abs_grad_y);
    addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0, grad);
    
    imshow(window_name, grad);
    char key = (char)waitKey(0);
    
    int old_ksize, old_scale, old_delta, old_threshold;
    
    if(key == 27)
    {
      break;
    }
    if (key == 'k' || key == 'K')
    {
      old_ksize = ksize;
      ksize = ksize < 30 ? ksize+2 : -1;
      cout << "Changed 'ksize' from " << old_ksize << " to " << ksize << endl;
    }
    if (key == 's' || key == 'S')
    {
      old_scale = scale;
      scale++;
      cout << "Changed 'scale' from " << old_scale << " to " << scale << endl;
    }
    if (key == 'd' || key == 'D')
    {
      old_delta = delta;
      delta++;
      cout << "Changed 'delta' from " << old_delta << " to " << delta << endl;
    }
    if (key == 'r' || key == 'R')
    {
      scale =  1;
      ksize = -1;
      delta =  0;
      cout << "Reset to Scharr algorithm." << endl;
    }
    if (key == '+') {
      old_threshold = (int)thr;
      if (thr < 250) {
        thr += 5;
        cout << "Increasing threshold from " << old_threshold << " to " << (int)thr << endl;
        key = 'f';
      }
    }
    if (key == '-') {
      old_threshold = (int)thr;
      if (thr > 5) {
        thr -= 5;
        cout << "Decreasing threshold from " << old_threshold << " to " << (int)thr << endl;
        key = 'f';
      }
    }
    if (key == 'f' || key == 'F') {
      Mat img2, img3;
      int min_area = 10000;
      // not necessary here:
      // cvtColor(img, img2, COLOR_BGR2GRAY);
      blur( grad, img2, Size(3,3) );
      threshold(img2, img3, thr, 255, THRESH_BINARY);
      
      Mat element = getStructuringElement(MORPH_CROSS, Size(3, 3), Point(1, 1));
      erode(img3, img3, element); // without it find contours fails on some rects

      // preprocessing done, search rectanges
      vector<vector<Point> > contours;
      
      // vector<Vec4i> hierarchy;
      findContours(img3, contours, /* hierarchy, */ RETR_EXTERNAL, CHAIN_APPROX_SIMPLE);
      
      vector<RotatedRect> rects;
      for (int i = 0; i < contours.size(); i++) {
        // if (hierarchy[i][2] > 0) continue;
      
        // capture inner contour
        RotatedRect rr = minAreaRect(contours[i]);
        if (rr.size.area() < min_area) continue; // too small?
        if (rr.size.width > img3.size().width - 5) continue; // rectangle encloses entire image?
      
        rr.size.width += 8;
        rr.size.height += 8; // expand to outlier rect if needed
        rects.push_back(rr);
        
        cout << "***************\nRectangle dimensions found:"
             << "\nRect " << i << ":"
             << "\n\twidth:\t" << rr.size.width
             << "\n\theight:\t" << rr.size.height
             << "\n\tarea:\t" << rr.size.width * rr.size.height
             << "\n***************\n"
             << endl;
      }
      
      Mat debugImg;
      image.copyTo(debugImg);
    
      for (RotatedRect rr : rects) {
        Point2f points[4];
        rr.points(points);
        for (int i = 0; i < 4; i++) {
          int ii = (i + 1) % 4;
          line(debugImg, points[i], points[ii], CV_RGB(255, 0, 0), 2);
        }
      }
      imshow("debug", debugImg);
      waitKey();
      destroyWindow("debug");
    }
  }

    cout << "\n************************\nParameters:"
         << "\n\tksize     = " << ksize
         << "\n\tscale     = " << scale
         << "\n\tdelta     = " << delta
         << "\n\tthreshold = " << thr 
         << "\n************************" 
         << endl;

  return EXIT_SUCCESS;
}

Threshold value starts at 125 and can be increased or decreased as necessary. Threshold can be incremented or decremented by 5 when pressing "+" or "-", which also automatically applies any rectangles to the original image and shows it in a new window. Changing one of the parameters, as in the original Sobel example code, will work as intended, but then the "f" or "F" key must be pressed to display the rectangle overlays.

The best result seems to result when I use Scharr (passing -1 as ksize) with scale=5 and delta=1 and decreasing the threshold to 120. But I am still wondering how to automate everything ... the minimal user input would be to give input as to the number of receipts which were scanned, and telling them to make sure that they do not overlap. But all of the parameters, etc. should be automatically determined from the image, if possible.

Thanks.

EDIT 2:

This SO post helped me further. Using this modified Scharr kernel:

 -5  0   5 
-13  0  13
 -5  0   5

and the code in the last answer to the question, I was able to generate this image from the original after also applying a light bilateral filter to it. It is quite similar to what I was able to do in GIMP:

bilateral filter with modified Scharr kernal

Suggestions for simplification would be welcome.

0

There are 0 best solutions below