converting a disparity map to a depth map using given calibration file

1.4k Views Asked by At

I have a pair of stereo images and I have computed the disparity image for them. Now I need to convert this disparity map to a depth map. I have found that

depth = (baseline * focal length) / disparity)

but I don't know how to find the values for baseline, focal length and disparity. This is the calibration file given.

calib_time: 09-Jan-2012 13:57:47
corner_dist: 9.950000e-02
S_00: 1.392000e+03 5.120000e+02
K_00: 9.842439e+02 0.000000e+00 6.900000e+02 0.000000e+00 9.808141e+02 2.331966e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_00: -3.728755e-01 2.037299e-01 2.219027e-03 1.383707e-03 -7.233722e-02
R_00: 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
T_00: 2.573699e-16 -1.059758e-16 1.614870e-16
S_rect_00: 1.242000e+03 3.750000e+02
R_rect_00: 9.999239e-01 9.837760e-03 -7.445048e-03 -9.869795e-03 9.999421e-01 -4.278459e-03 7.402527e-03 4.351614e-03 9.999631e-01
P_rect_00: 7.215377e+02 0.000000e+00 6.095593e+02 0.000000e+00 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
S_01: 1.392000e+03 5.120000e+02
K_01: 9.895267e+02 0.000000e+00 7.020000e+02 0.000000e+00 9.878386e+02 2.455590e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_01: -3.644661e-01 1.790019e-01 1.148107e-03 -6.298563e-04 -5.314062e-02
R_01: 9.993513e-01 1.860866e-02 -3.083487e-02 -1.887662e-02 9.997863e-01 -8.421873e-03 3.067156e-02 8.998467e-03 9.994890e-01
T_01: -5.370000e-01 4.822061e-03 -1.252488e-02
S_rect_01: 1.242000e+03 3.750000e+02
R_rect_01: 9.996878e-01 -8.976826e-03 2.331651e-02 8.876121e-03 9.999508e-01 4.418952e-03 -2.335503e-02 -4.210612e-03 9.997184e-01
P_rect_01: 7.215377e+02 0.000000e+00 6.095593e+02 -3.875744e+02 0.000000e+00 7.215377e+02 1.728540e+02 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00 0.000000e+00
S_02: 1.392000e+03 5.120000e+02
K_02: 9.597910e+02 0.000000e+00 6.960217e+02 0.000000e+00 9.569251e+02 2.241806e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_02: -3.691481e-01 1.968681e-01 1.353473e-03 5.677587e-04 -6.770705e-02
R_02: 9.999758e-01 -5.267463e-03 -4.552439e-03 5.251945e-03 9.999804e-01 -3.413835e-03 4.570332e-03 3.389843e-03 9.999838e-01
T_02: 5.956621e-02 2.900141e-04 2.577209e-03
S_rect_02: 1.242000e+03 3.750000e+02
R_rect_02: 9.998817e-01 1.511453e-02 -2.841595e-03 -1.511724e-02 9.998853e-01 -9.338510e-04 2.827154e-03 9.766976e-04 9.999955e-01
P_rect_02: 7.215377e+02 0.000000e+00 6.095593e+02 4.485728e+01 0.000000e+00 7.215377e+02 1.728540e+02 2.163791e-01 0.000000e+00 0.000000e+00 1.000000e+00 2.745884e-03
S_03: 1.392000e+03 5.120000e+02
K_03: 9.037596e+02 0.000000e+00 6.957519e+02 0.000000e+00 9.019653e+02 2.242509e+02 0.000000e+00 0.000000e+00 1.000000e+00
D_03: -3.639558e-01 1.788651e-01 6.029694e-04 -3.922424e-04 -5.382460e-02
R_03: 9.995599e-01 1.699522e-02 -2.431313e-02 -1.704422e-02 9.998531e-01 -1.809756e-03 2.427880e-02 2.223358e-03 9.997028e-01
T_03: -4.731050e-01 5.551470e-03 -5.250882e-03
S_rect_03: 1.242000e+03 3.750000e+02
R_rect_03: 9.998321e-01 -7.193136e-03 1.685599e-02 7.232804e-03 9.999712e-01 -2.293585e-03 -1.683901e-02 2.415116e-03 9.998553e-01
P_rect_03: 7.215377e+02 0.000000e+00 6.095593e+02 -3.395242e+02 0.000000e+00 7.215377e+02 1.728540e+02 2.199936e+00 0.000000e+00 0.000000e+00 1.000000e+00 2.729905e-03
1

There are 1 best solutions below

5
On

I'll give you two answers. The first one is the academic one.

In your data (not sure where it comes from), I distinguish each line from the first letter: S = Image size K = Intrinsic camera matrix (3x3) D = Distortion coefficients R = Rotation matrix (3x4) T = Translation vector (3x1)

It seems that we have 4 cameras here, so not sure. That's up to you to understand. Anyway, given the data above you may calculate camera center in the space. For the first camera:

C_00 = -R_00 . T_00

where . stands for the dot product. Do the same for the other camera, then the baseline is simply the distance between the two camera centers.

Focal length is already there: element (0,0) of intrinsic matrix is fx focal length over the x axis. Element (1,1) is fy focal length over y axis. Why two focal lengths??? You will see that they are very similar indeed, but the difference reflects the imperfection of the camera lenses.

So, how do you get the depth??? Here I go with the less academic answer. In practice you have to follow this process:

  1. Calibrate the two cameras (using metric units)
  2. Take pictures
  3. Undistort images (correct lens distortion using your D_00 coefficients)
  4. Rectify images (what's rectification? You can check here and here). After this step the left and right images look like they were taken by the same camera only shifted along x.
  5. Apply the formula to calculate depth. However, since you have fx and fy, but also the shear factor (element (0,1) of the camera matrix) and the principal point shift (elements (0,2) and (1,2) of the camera matrix), the formula is slightly complex that the one you suggested. So I suggest to use OpenCV and use the function cv2.reprojectImageTo3D, calculating the Q matrix as done here.

You can find a working example to go from images to depth map and finally to depth (point cloud), here.

I hope this gives you a starting point. Cheers.