How to normalize feature-wise and sample-wise?

1.4k Views Asked by At

I'm implementing a content-based image retrieval (CBIR) based on feature extraction by histogram, HOG and local binary pattern. Each of these (normalized) feature extractions are stored separately in a csv file to calculate distances in the further step. This file looks like this:

img_ID0, 0.0, 0.0, 0.0, 0.4, 0.1, ...
img_ID1, 0.0, 0.1, 0.0, 0.2, 0.1, ...
img_ID2, 0.2, 0.0, 0.0, 0.4, 0.0, ...

I flatten the ndarray and normalizing along the entire flattened array. Which should be the sample-wise normalization (I'm not sure about it, so please correct me)

Now, how would a feature-wise normalization look like? Especially if I don't really have "named" columns? Should I have normalized along the (not flattened) image or later on on the flattened arrays column-wise over all images?

Literature just says, that feature-wise is commonly used, but it still depends on the application. CBIR seems to be very vague about this.

1

There are 1 best solutions below

3
On

Assuming your data before any normalization looks like this:

img_ID0, feat1_val, feat2_val, feat3_val,...
img_ID1, feat1_val, feat2_val, feat3_val,...
img_ID2, feat1_val, feat2_val, feat3_val,...

Each line is a new image(=sample), and each column is a feature. In that case, samplewise normalization would be normalizing along each line ("What is the relative value of feature X compared to feature Y for sample N?"), and featurewise normalization would be normalizing along each column("What is the relative value of feature X for sample N compared to sample M?").

Flattening the array before normalizing like you did would be yet another type of normalization. Also see https://stats.stackexchange.com/questions/354774/should-i-normalize-featurewise-or-samplewise