I'm trying to detect a known object by comparing the current frame in a video against pre-stored feature descriptors. My thought is to match the features of the current frame against the prestored features, and report a positive detection when the number of good features is above some threshold.
This, however, doesn't seem to work because DescriptorMatcher
will always report a certain fixed number of matches regardless of whether the object is actually in the scene. Even though I use a pretty conventional filtering approach to keep the top x number of good matches, it's still a metric that's relative to the current frame.
Is there something like goodness of match from DescriptorMatcher
that I can potentially use as a hard threshold? Is there a better approach to this? I have used Bag of Words before but it seems a bit overkill for the problem at hand, also it's a bit too computational exhaustive for my needs. Any suggestions/tips/pointers would be appreciated.
ImageDescriptor imageDescriptor = null;
imageDescriptor = ImageDescriptor.fromJson(jsonMetadata);
Mat storedDescriptors = imageDescriptor.getFeatureDescriptors(); // prestored features
FeatureDetector featureDetector = FeatureDetector.create(FeatureDetector.ORB);
DescriptorExtractor descriptorExtractor = DescriptorExtractor.create(DescriptorExtractor.ORB);
DescriptorMatcher descriptorMatcher = DescriptorMatcher.create(DescriptorMatcher.BRUTEFORCE_HAMMING);
MatOfKeyPoint keyPoints = new MatOfKeyPoint();
featureDetector.detect(rgba, keyPoints); // rgba is the image from current video frame
MatOfDMatch matches = new MatOfDMatch();
Mat currDescriptors = new Mat();
descriptorExtractor.compute(rgba, keyPoints, currDescriptors);
descriptorMatcher.match(descriptors_scene, storedDescriptors, matches);
MatOfDMatch good_matches = filterMatches(matches); // filterMatches return the matches that have a distance measure of < 2.5*min_distance
if(good_matches.rows()>threshold)
return true;
return false;
descriptorMatcher will always find (or tries to find) the best matches ( = the matches with smalles distances), it can't tell you whether the matches actually are really right. There are some approaches to guess which matches are right an which matches are wrong.
look at the disances of the matches. If the distance value is too big, the match probably isn't right (even if that was the smallest distance among all keypoints).
instead of only computing the best matches, try to compute the 2nd best match, too. If the 2nd best and the best matches are nearly of same quality ( = same distance), you might think that you can't decide, that one of them is a good match.
compute a robust homography with RANSAC. If enough inlier appear in your matches, those matches might be quite robust.
same as 4. but a fundamental matrix instead of homography
Didn't check it, but maybe have a look at this: https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobustMatcher.h
http://docs.opencv.org/3.1.0/dc/d2c/tutorial_real_time_pose.html