Came across Andrew Ng's non-linear hypothesis of neural networks where I had an MCQ to find the number of features for an image of resolution 100x100 of greyscale intensities.
And the answer was 50 million, 5 x 10^7.
However, earlier for a 50 x 50 pixel grey scale image, the number of features is 50x50 (2500) and for RGB image, it is 7500.
Why would it be 5 x 10^7 instead of 10,000?
He does however say including all quadratic terms (xi,xj) as features.
The question is:
Suppose you are learning to recognize cars from 100×100 pixel images (grayscale, not RGB). Let the features be pixel intensity values. If you train logistic regression including all the quadratic terms (xi,xj) as features, about how many features will you have?
And earlier he added that, if we were to use xi, xj ,we would end up with a total of 3 million features. Still I couldn't what relation is this?
You are confused by the similar names of the number of features of the image (= pixels) and the number of features a logistic regression algorithm would need to learn in order to solve the classification problem.
For the 100x100 pixel image, you have 10,000 pixels in the image. But, if you have a complex classification problem, it's not enough to learn a linear model for these pixels (e.g.
theta0 + theta1*x1 + theta2*x2 + theta3*x1x2), you also need to include higher order terms, like x², which results in many more terms (= features) in your equation (e.g.theta0 + theta1*x1 + theta2*x2 + theta3*x1x2 + theta4*x1²x2 + theta5*x1x2² + theta6*x1²x2²).This is what he meant with
As you can see, we have all combinations of the quadratic terms of x1 and x2 in the equation above.
How many terms (= features) you need, depends on the complexity of the classification problem you want to solve.
This is the reason why you get such a high number of features with a much smaller amount of pixels. (He also shows an example of this around the 2 minute mark in the video)