I'm working with Principal Component Analysis (PCA) in openCV. The constructor inputs for the case I'm interested in are:
PCA(InputArray data, InputArray mean, int flags, double retainedVariance);
Regarding the InputArray 'data' the documents state the appropriate flags should be:
CV_PCA_DATA_AS_ROW indicates that the input samples are stored as matrix rows. CV_PCA_DATA_AS_COL indicates that the input samples are stored as matrix columns.
My question pertains to the use of the term 'samples' in that I'm not sure what a sample is in this context.
For example let's say I have 4 sets of data and for the sake of illustration let's label them A-D. Now each set A through D has 8 elements. They are then set up in the Mat variable I'll use as InputArray as follows:
The question is, which is it:
- My sets are samples?
- My data elements are samples?
Another way of asking:
- Do I have 4 samples (CV_PCA_DATA_AS_COL)
- Or do I have 4 sets of 8 samples (CV_PCA_DATA_AS_ROW)
?
As a guess, I'd choose CV_PCA_DATA_AS_COL (i.e. I have 4 samples) - but that's just where my head is at... Until I learn the correct terminology it seems the word 'sample' could apply to either reasoning.
Ugh...
So the answer was found by reversing the logic behind the documentation for the PCA::project step...
i.e. 'sample' is equivalent to 'set', and the elements are the 'dimension'.
(and my guess was correct :)