I'm implementing a content-based image retrieval (CBIR) based on feature extraction by histogram, HOG and local binary pattern. Each of these (normalized) feature extractions are stored separately in a csv
file to calculate distances in the further step. This file looks like this:
img_ID0, 0.0, 0.0, 0.0, 0.4, 0.1, ...
img_ID1, 0.0, 0.1, 0.0, 0.2, 0.1, ...
img_ID2, 0.2, 0.0, 0.0, 0.4, 0.0, ...
I flatten the ndarray
and normalizing along the entire flattened array. Which should be the sample-wise normalization (I'm not sure about it, so please correct me)
Now, how would a feature-wise normalization look like? Especially if I don't really have "named" columns? Should I have normalized along the (not flattened) image or later on on the flattened arrays column-wise over all images?
Literature just says, that feature-wise is commonly used, but it still depends on the application. CBIR seems to be very vague about this.
Assuming your data before any normalization looks like this:
Each line is a new image(=sample), and each column is a feature. In that case, samplewise normalization would be normalizing along each line ("What is the relative value of feature X compared to feature Y for sample N?"), and featurewise normalization would be normalizing along each column("What is the relative value of feature X for sample N compared to sample M?").
Flattening the array before normalizing like you did would be yet another type of normalization. Also see https://stats.stackexchange.com/questions/354774/should-i-normalize-featurewise-or-samplewise