When it comes to cascade classifiers (using haar like features) I always read that methods like AdaBoosting are used to select the 'best' features for detection. However this only works if there is some initial set of features to begin boosting.
Given a 24x24 pixel image there are 162,336 possible haar features. I might be wrong here, but I don't think libraries like openCV initially test against all of these features.
So my question is how are the initial features selected or how are they generated? Is there any guideline about the initial number of features?
And if all 162,336 features are used initially. How are they generated?
I presume, you're familiar with Viola/Jones' original work on this topic.
You start by manually choosing a feature type (e.g. Rectangle A). This gives you a mask with which you can train your weak classifiers. In order to avoid moving the mask pixel by pixel and retraining (which would take huge amounts of time and not any better accuracy), you can specify how much the feature moves in x and y direction per trained weak classifier. The size of your jumps depend on your data size. The goal is to have the mask be able to move in and out of the detected object. The size of the feature can also be variable.
After you've trained multiple classifiers with a respective feature (i.e. mask position), you proceed with AdaBoost and Cascade training as usual.
The number of features/weak classifiers is highly dependent on your data and experimental setup (i.e. also the type of classifier you use). You'll need to test the parameters extensibly to also know which type of features work best (rectangle/circle/tetris-like objects etc). I worked on this 2 years ago and it took us quite a long time to evaluate which features and feature-generation-heuristics yielded the best results.
If you wanna start somewhere, just take 1 of the 4 original Viola/Jones features and train a classifier applying it anchored to (0,0). Train the next classifier with (x,0). The next with (2x,0)....(0,y), (0,2y), (0,4y),.. (x,y), (x, 2y) etc... And see what happens. Most likely you'll see that it's ok to have less weak classifiers, i.e. you can proceed to increase the x/y step values which determine how the mask slides. You can also have the mask grow or do other stuff to save time. The reason this "lazy" feature generation works is AdaBoost: as long as these features make the classifiers slightly better than random, AdaBoost will combine these classifiers into a meaningful classifier.