I have tabular data in the form of binary codes in each feature.
| Student | Group 1 - SubGroup1 | Group 1 - SubGroup2 | Group 2 - SubGroup1 | Group 2 - SubGroup2 |
|---|---|---|---|---|
| A | 1 | 0 | 0 | 0 |
| B | 0 | 1 | 0 | 0 |
| C | 0 | 0 | 0 | 1 |
"1" represents a student belonging to a specific subgroup of a group, and "0" indicates that the student does not belong to the corresponding group-subgroup. Note that each row are not necessarily a one-hot vector (a student may belong to several groups-subgroups).
I would like to calculate the pair-wise distances (or similarities). However, I want the distance between A and B to be smaller than the distance between A-C or B-C, because A and B belong to the same group (although their subgroup is different), while C belongs to a completely different group. Are there any known kinds of setting that allow modifying the "weight" attached to features in different groups when calculating similarities/distances?
I got to know that Hamming discance is suitable for measuring the distance between two binary strings, but in this case, it will give equal distances for all A-B, B-C, C-A (and so does Euclidian distance).