Calculating distance/similarity with grouped features

29 Views Asked by At

I have tabular data in the form of binary codes in each feature.

Student Group 1 - SubGroup1 Group 1 - SubGroup2 Group 2 - SubGroup1 Group 2 - SubGroup2
A 1 0 0 0
B 0 1 0 0
C 0 0 0 1

"1" represents a student belonging to a specific subgroup of a group, and "0" indicates that the student does not belong to the corresponding group-subgroup. Note that each row are not necessarily a one-hot vector (a student may belong to several groups-subgroups).

I would like to calculate the pair-wise distances (or similarities). However, I want the distance between A and B to be smaller than the distance between A-C or B-C, because A and B belong to the same group (although their subgroup is different), while C belongs to a completely different group. Are there any known kinds of setting that allow modifying the "weight" attached to features in different groups when calculating similarities/distances?

I got to know that Hamming discance is suitable for measuring the distance between two binary strings, but in this case, it will give equal distances for all A-B, B-C, C-A (and so does Euclidian distance).

0

There are 0 best solutions below