How to get Bird's Eye View from KITTI by Projection Matrix?

2.1k Views Asked by At

The goal is to get the Bird's Eye View from KITTI images (dataset), and I have the Projection Matrix (3x4).

There are many ways to generate transformation matrices. For Bird's Eye View I have read some kind math expressions, like:

H12 = H2*H1-1=ARA-1=P*A-1 in OpenCV - Projection, homography matrix and bird's eye view

and x = Pi * Tr * X in kitti dataset camera projection matrix

but none of these options worked for my purpose.

PYTHON CODE

import numpy as np import cv2

image = cv2.imread('Data/RGB/000007.png')

maxHeight, maxWidth = image.shape[:2]

M has 3x4 dimensions

M = np.array(([721.5377, 0.0, 609.5593, 44.85728], [0.0, 721.5377, 72.854, 0.2163791], [0.0, 0.0, 1.0, .002745884]))

Here It's necessary a M matrix with 3x3 dimensions

warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))

show the original and warped images

cv2.imshow("Original", image)

cv2.imshow("Warped", warped)

cv2.waitKey(0)

I need to know how to manage the Projection Matrix for getting Bird's Eye View.

So far, everything I've tried throws warped images at me, without information even close to what I need.

This is a example of image from the KITTI database.

This is other example of image from the KITTI database.

On the left, images are shown detecting cars in 3D (above) and 2D (below). On the right is the Bird's Eye View that I want to obtain. Therefore, I need to obtain the transformation matrix to transform the coordinates of the boxes that delimit the cars.

1

There are 1 best solutions below

3
On

Here is my code to manually build a bird's eye view transform:

cv::Mat1d CameraModel::getInversePerspectiveMapping(double pixelPerMeter, cv::Point const & origin) const {
    double f = pixelPerMeter * cameraPosition()[2];
    cv::Mat1d R(3,3);
    R <<  0, 1, 0,
          1, 0, 0,
          0, 0, 1;

    cv::Mat1d K(3,3);
    K << f, 0, origin.x, 
         0, f, origin.y, 
         0, 0, 1;
    cv::Mat1d transformtoGround = K * R * mCameraToCarMatrix(cv::Range(0,3), cv::Range(0,3));
    return transformtoGround * mIntrinsicMatrix.inv();
}

The member variables/functions used inside the functions are

  • mCameraToCarMatrix: a 4x4 matrix holding the homogeneous rigid transformation from the camera's coordinate system to the car's coordinate system. The camera's axes are x-right, y-down, z-forward. The car's axes are x-forward, y-left, z-up. Within this function only the rotation part of mCameraToCarMatrix is used.
  • mIntrinsicMatrix: the 3x3 matrix holding the camera's intrinsic parameters
  • cameraPosition()[2]: the Z-coordinate (height) of the camera in the car's coordinate frame. It's the same as mCameraToCarMatrix(2,3).

The function parameters:

  • pixelPerMeter: the resolution of the bird's eye view image. A distance of 1 meter on the XY plane will translate to pixelPerMeter pixels in the bird's eye view image.
  • origin: the camera's position in the bird's eye view image

You can pass the transform matrix to cv::initUndistortRectifyMaps() as newCameraMatrix and then use cv::remap to create the bird's eye view image.