Camera pose estimation

6.9k Views Asked by At

I am trying to write a program from scratch that can estimate the pose of a camera. I am open to any programming language and using inbuilt functions/methods for feature detection...

I have been exploring different ways of estimating pose like SLAM, PTAM, DTAM etc... but I don't really need need tracking and mapping, I just need the pose.

Can any of you suggest an approach or any resource that can help me ? I know what pose is and a rough idea of how to estimate it but I am unable to find any resources that explain how it can be done.

I was thinking of starting with a video recorded, extracting features from the video and then using these features and geometry to estimate the pose.

2

There are 2 best solutions below

9
On BEST ANSWER

Generally, you can extract the pose of a camera only relative to a given reference frame. It is quite common to estimate the relative pose between one view of a camera to another view. The most general relationship between two views of the same scene from two different cameras, is given by the fundamental matrix (google it). You can calculate the fundamental matrix from correspondences between the images. For example look in the Matlab implementation: http://www.mathworks.com/help/vision/ref/estimatefundamentalmatrix.html After calculating this, you can use a decomposition of the fundamental matrix in order to get the relative pose between the cameras. (Look here for example: http://www.daesik80.com/matlabfns/function/DecompPMatQR.m).

You can work a similar procedure in case you have a calibrated camera, and then you need the Essential matrix instead of fundamnetal.

5
On

In order to compute a camera pose, you need to have a reference frame that is given by some known points in the image. These known points come for example from a calibration pattern, but can also be some known landmarks in your images (for example, the 4 corners of teh base of Gizeh pyramids).

The problem of estimating the pose of the camera given known landmarks seen by the camera (ie, finding 3D position from 2D points) is classically known as PnP. OpenCV provides you a ready-made solver for this problem.

However, you need first to calibrate your camera, ie, you need to determine what makes it unique. The parameters that you need to estimate are called intrinsic parameters, because they will depend on the camera focal length, sensor size... but not on the camera location or orientation. These parameters will mathematically explain how world points are projected onto your camera sensor frame. You can estimate them from known planar patterns (again, OpenCV has some ready-made functions for that).