Measuring X,Y,Z Coordinates of a detected object in Yolo v3 Darknet

853 Views Asked by At

I have trained a Yolo network to detect objects on roads using Yolo v3 running on a python-implemented Darknet environment.

I am using Intel Realsense L515 and Intel Realsense D435i.

How can I obtain the X,Y,Z coordinates of the detected object along with the distance from the camera to the object itself?

Any help is highly appreciated!

1

There are 1 best solutions below

1
On

Basically, the conversion you aim to do is the reverse operation of a projection. I believe the only important information is that you have a depth camera and a prediction produced by an object detector. The information you have is UV coordinates (in the image plane), and the depth (Z coordinate). And you want to convert the UVZ (pixel coordinates) to XYZ (world coordinates). Notice the Z doesn't change during the conversion. The common Z coordinate is what allows you to perform the conversion.

The conversion is based on a pinhole camera model https://en.wikipedia.org/wiki/Pinhole_camera_model. You need to know the intrinsic parameters of the camera that captured the images. These parameters are the focal length f and a principal point.

You also may want to visit https://en.wikipedia.org/wiki/Camera_resectioning. You'll find there how to do this conversion using a projection matrix created from the focal length and the principal point. Note, that they describe projection from world-to-pixel coordinates. You need to compute the inverse projection matrix to do the pixel-to-world conversion.

I also include an example in Tensorflow using a projection matrix.

self.intrinsic_matrix = tf.constant([[self.focal_length[0], 0, self.principal_point[0], 0],
                                     [0, self.focal_length[1], self.principal_point[1], 0],
                                     [0, 0, 1, 0],
                                     [0, 0, 0, 1]], dtype=tf.float32)
self.invr_projection_matrix = tf.linalg.inv(self.projection_matrix)

multiplied_uv = points_uvz[..., 0:2] * points_uvz[..., 2:3]
ones = tf.ones([points_uvz.shape[0], points_uvz.shape[1], 1], dtype=points_uvz.dtype)
multiplied_uvz1 = tf.concat([multiplied_uv, points_uvz[..., 2:3], ones], axis=-1)
tranposed_xyz = tf.matmul(self.invr_projection_matrix, multiplied_uvz1, transpose_b=True)
xyz = tf.transpose(tranposed_xyz, [0, 2, 1])[..., :3]