I have a set of points and extract a small subset of them for calculating a bivariate normal distribution. Afterwards I check all other points if they fit in this distribution by calculating the PDF for every point and rejecting points with a value below some threshold.
So much about the theory...
The PDF has according to wikipedia the formula:
σ is the standard deviation and μ is the mean, calculated as following:
cv::Scalar mean;
cv::Scalar stdDev;
dataPoints = dataPoints.reshape(3); // convert 3 columns to 3 channels
cv::meanStdDev(dataPoints, mean, stdDev);
dataPoints = dataPoints.reshape(1); // convert back
meanX = mean.val[0];
meanY = mean.val[1];
sigmaX = stdDev.val[0];
sigmaY = stdDev.val[1];
dataPoints is a cv::Mat with 3 columns of floats (x, y, index).
ρ is the correlation coefficient which I calculate like this:
cv::matchTemplate(dataPoints.col(0), dataPoints.col(1), rho, cv::TM_CCOEFF_NORMED);
The last step is calculating the the probability for each point using this:
double p = (1. / (2. * M_PI * sigmaX * sigmaY * sqrt(1. - pow(rho, 2))));
double e = exp((-1. / 2.) * D(x, y, rho));
double ret = p * e;
And D() should be as far as I know the Mahalanobis Distance, but the formula from OpenCV cv::Mahalanobis(x, y, rho)
returns another value than when I calculate it myself:
double cX = (x - meanX) / sigmaX;
double cY = (y - meanY) / sigmaY;
double a = (1. / (1. - pow(rho, 2)));
double b = (pow(cX, 2) + pow(cY, 2) - 2. * rho * cX * cY);
double ret = a * b;
So and now my Problem:
As far as I know the integral over the PDF should be 1 and the maximum value of the PDF should be at (meanX, meanY)
, so when σ would be 0 the PDF at mean should be 1. But with the computations above I can get values over 1. What do I get wrong?