PoseCNN (PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes) is using a CNN as backbone network. CNNs can be trained on 2D RGB data. How can we train the PoseCNN using 3D data (RGB-D like) like LINEMOD, Occlusion LINEMOD, YCB-Video datasets (as these datasets contains 3D models) for 3D object detection. I don't really understand it.

https://github.com/yuxng/PoseCNN

Similarly PVNet (PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation) use same datasets for training their network.

https://github.com/zju3dv/pvnet

They basically take a semantic segmentation model and further modify for object pose estimation. But we use RGB datasets for training classification and semantic segmentation models based on CNNs. For object pose estimation they use 3D datasets like LINEMOD, Occlusion LINEMOD, and YCB-Videos datasets. These datasets contains 3D models of objects.

Note: These papers use simple single RGB camera.

What specifically do we need to do to create such 3D datasets?

0

There are 0 best solutions below