How to convert world coordinates to image coordinates for datasets like Shrec 17 and DHG

324 Views Asked by At

I am working on a machine learning model for detecting Keypoints for hands using depth image. So far the datasets I have seen includes labels for keypoints/skeletons in world and image view (See Shrec 17 or DHG dataset). I have seen couple of papers and their implementations that learn the world coordinates for keypoint detection. I want to understand how to map the 3D world coordinates to the depth image and check the visualization on the data and possibly extend the trained model for live predictions/visualization on Azure Kinect

1

There are 1 best solutions below

0
On

You have to know the calibration matrices of the camera. The pipeline for this is the following.

3D World Coordinates --> 3D Camera Coordinates --> 2D Camera Coordinates.

The first step is called extrinsic calibration and the second step is the so-called intrinsic calibration and you need it in any case.

Example: Lets say you have a LIDAR for 3D points detection. The world coordinates you have is not with respect to LIDARs' origin. If your camera is not at the very same place as your LIDAR(which is pyhsically impossible but if they pretty close you might ignore), first you have to transform these 3D coordinates so that they are now represented with respect to the cameras' origin. You can do this with rotation and translation transform matrices if you know the position of the camera and the LIDAR.

The second step again goes by transformation matrices. However, you need to know some intrinsic parameters about the camera in use. (E.g focal length, skew) These can be computed with some experiments if you have the camera but in your case, it should rather be that these calibration matrices are provided to you together with the data. So ask for it.

You can read about all these in this link. https://www.mathworks.com/help/vision/ug/camera-calibration.html