How to track a single point (e.g. corner) in UAV images?

101 Views Asked by At

I'm trying to track a single point in UAV images, which is often a corner. It will also have a rough initial position (deviation from true value is about 30 pixels). The figure below shows the ROI obtained with the initial position, and the yellow markers are the tracked points. This is a simple scenario, but it still has many errors.

A simple example for tracking

Some requirements:

  • Perform this task faster than manually, of course the faster the better;

  • It is not necessary to return all the results, probably only 25-30 for 100 images, but it is important to be as accurate as possible;

  • Adapt to the situation of UAV images: noise, affine transformations such as large rotations and translations.

My current idea is:

  • choose a reference image, manually select a corner on the reference image as the point for tracking, and use some kind of descriptor as its feature vector;

  • traverse all the images:

  1. Obtain an ROI centered on the initial position;

  2. Perform corner detection on the ROI and compute the descriptors;

  3. Match the descriptors with the descriptor of the reference point, and the most similar descriptor is the tracked point;

In my experiments, I simply tried the ORB detector in OpenCV and BFMatcher to perform the above process, but did not get good results. I suspected that it was caused by inaccurate matching, so I tried to utilize the optimal transport mentioned in SuperGlue[1] to solve the matching problem, but found that it received mediocre results due to the initial value at "dustbin" and the value of the lambda parameter. The rest of the attempts including switching to other corner detectors (CSS, Phase Congruency) and performing corner detection on the reference image before matching, were also unsatisfactory. Which is the right direction to consider this problem?

My attempts:

  1. ORB detector/descriptors + BFMatcher

  2. ORB detector/descriptors + Optimal Transport Match

void optimalTransportMatch(const cv::Mat &descriptors1, const cv::Mat &descriptors2, cv::Mat &matchMatrix)
{
    //0.prepare
    int matchRows = descriptors1.rows + 1;
    int matchCols = descriptors2.rows + 1;
    matchMatrix = cv::Mat(matchRows, matchCols, CV_64FC1);
    
    cv::Mat costMatrix = cv::Mat(matchRows, matchCols, CV_64FC1);
    double binValue = 128*7;
    double lambda = 0.2;
    for (int i = 0; i < matchRows; i++)
    {
        for (int j = 0; j < matchCols; j++)
        {
            //last row
            if (i == matchRows - 1 && j != matchCols - 1)
            {
                double cost = binValue;  //maybe change
                costMatrix.at<double>(i, j) = cost; //cost
                matchMatrix.at<double>(i, j) = exp(-lambda * cost); //initial value
                continue;
            }
            //last col
            if(j == matchCols - 1 && i!= matchRows - 1)
            {
                double cost = binValue;  //maybe change
                costMatrix.at<double>(i, j) = cost;
                matchMatrix.at<double>(i, j) = exp(-lambda * cost);
                continue;
            }
            //binToBin
            if(i  == matchRows - 1 && j == matchCols - 1)
            {
                double cost = binValue;  //maybe change
                costMatrix.at<double>(i, j) = cost;
                matchMatrix.at<double>(i, j) = exp(-lambda * cost);
                continue;
            }
            //cost/match matrix
            cv::Mat descriptor1 = descriptors1.row(i);
            cv::Mat descriptor2 = descriptors2.row(j);
            double cost = cv::norm(descriptor1, descriptor2, cv::NORM_L2);
            costMatrix.at<double>(i, j) = cost;
            matchMatrix.at<double>(i, j) = exp(-lambda * cost);
        }
    }
    //1.Sinkhorn-Knopp algorithm
    double distEpsilon = 1e-8, iterMaxDist;
    int iterNum = 0;
    cv::Mat colsSumNext, colsSumCurrent, colsSumDiff, rowsSum;

    cv::Mat stdColsSum = cv::Mat::ones(matchRows, 1, CV_64FC1);
    cv::Mat stdRowsSum = cv::Mat::ones(1, matchCols, CV_64FC1);
    stdColsSum.at<double>(matchRows - 1, 0) = matchCols - 1;
    stdRowsSum.at<double>(0,matchCols - 1) = matchRows - 1;

    colsSumNext = cv::Mat::zeros(matchRows, 1, CV_64FC1);
    cv::reduce(matchMatrix, colsSumCurrent, 1, cv::REDUCE_SUM, CV_64FC1);
    colsSumDiff =cv::abs(colsSumCurrent - colsSumNext);
    cv::minMaxLoc(colsSumDiff, NULL, &iterMaxDist);
    while (iterMaxDist > distEpsilon || iterNum < 10)
    {
        iterNum++;
        cv::Mat rowsScale, colsScale;
        //scale rows
        cv::reduce(matchMatrix, colsSumNext, 1, cv::REDUCE_SUM, CV_64FC1);
        cv::divide(stdColsSum, colsSumNext, rowsScale);
        rowsScale = cv::repeat(rowsScale, 1, matchCols);
        cv::multiply(matchMatrix, rowsScale, matchMatrix);
        //scale cols
        cv::reduce(matchMatrix, rowsSum, 0, cv::REDUCE_SUM, CV_64FC1);
        cv::divide(stdRowsSum, rowsSum, colsScale);
        colsScale = cv::repeat(colsScale, matchRows, 1);
        cv::multiply(matchMatrix, colsScale, matchMatrix);
        //condition
        cv::reduce(matchMatrix, colsSumCurrent, 1, cv::REDUCE_SUM, CV_64FC1);
        colsSumDiff = cv::abs(colsSumCurrent - colsSumNext);
        cv::minMaxLoc(colsSumDiff, NULL, &iterMaxDist);
    }

Notes:

  • This problem is actually the automatic detection of ground control points for UAV imagery; standardized control points are easy to identify because of their specific shapes, but non-standardized control points are difficult to detect and track. There is a literature [2] related to this, but the method in the literature is: return the nearest corner point of the reprojected point. But obviously, this method fails when the reprojection error is large or the scene is more complex;

  • If the position of the initial point deviates a lot (e.g. 200 pixels), how to perform accurate tracking?

  • Ground control points are often corner points of road signs.

[1]P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, ‘SuperGlue: Learning Feature Matching With Graph Neural Networks’, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 4937–4946.

[2]Z. Zhu, T. Bao, Y. Hu, and J. Gong, ‘A Novel Method for Fast Positioning of Non-Standardized Ground Control Points in Drone Images’, Remote Sensing, vol. 13, no. 15, Art. no. 15, Jan. 2021, doi: 10.3390/rs13152849.

0

There are 0 best solutions below