I'm trying to make a camera tracking app that works inside a well-known 3D program. Thus far, I've been able to implement 2D tracking and I have a basic camera pose solver based off the DLT method. I have a radial distortion corrector there as well. They are all basic, but seem to work enough.
To describe my progress, my system can currently track my RED RAW camera footage, but things can only go to a camera solve when I know the global positions of the 2D tracked markers. I'm really unsure how to get to a camera pose when I don't know any global positions of any of the tracked features. In short, no survey data. Or camera intrinsics for that matter.
I'll just mention quickly, generally speaking I understand 3D and cameras fairly well - just not coding them. I'm self-taught, which I'm sure is hindering my search efforts, among a few other things!
I'm developing my software around knowing things like the camera intrinsics and having some sort of survey data. But inevitably, I'm going to be handed footage that has no 'supporting information'. Thus, I'm trying to make my software a little more flexible for the times when I have little to go on.
For reference, I'm thinking of programs that can do 'auto-tracking'. That is, load your footage, hit auto-track, let the system do it's thing, and 'bam' you have some sort of camera solve at the end. They must be doing something here I'm not. Using another type of solver? Or guessing something perhaps? How would this step work?
Not having any survey data will most likely be the main problem. I'm not sure what to try - I can't think of how to go from 2D->3D when the starting locations are unknown. I'm not using any libraries, there's no OpenCV etc. here. It's mostly my own naive code implementation.
Will probably need a fairly rudimentary answer. But any education, insight, direction and/or description here will be hugely appreciated.