How a CNN based network developed for 2D RGB data, be trained on 3D data like LINEMOD dataset for object pose estimation?

203 Views Asked by ML Dev At 06 June 2025 at 15:42

PoseCNN (PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes) is using a CNN as backbone network. CNNs can be trained on 2D RGB data. How can we train the PoseCNN using 3D data (RGB-D like) like LINEMOD, Occlusion LINEMOD, YCB-Video datasets (as these datasets contains 3D models) for 3D object detection. I don't really understand it.

https://github.com/yuxng/PoseCNN

Similarly PVNet (PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation) use same datasets for training their network.

https://github.com/zju3dv/pvnet

They basically take a semantic segmentation model and further modify for object pose estimation. But we use RGB datasets for training classification and semantic segmentation models based on CNNs. For object pose estimation they use 3D datasets like LINEMOD, Occlusion LINEMOD, and YCB-Videos datasets. These datasets contains 3D models of objects.

Note: These papers use simple single RGB camera.

What specifically do we need to do to create such 3D datasets?

Original Q&A

How a CNN based network developed for 2D RGB data, be trained on 3D data like LINEMOD dataset for object pose estimation?

There are 0 best solutions below

Related Questions in DEEP-LEARNING

Related Questions in AUGMENTED-REALITY

Related Questions in OBJECT-DETECTION

Related Questions in SEMANTIC-SEGMENTATION

Related Questions in POSE-ESTIMATION

Trending Questions

Popular # Hahtags

Popular Questions