Good evening to you all,
I'm trying to group data according to their values on 3 columns together. I would like to group rows that have X, Y and Z coordinates that are less than 12% different from each other. In other words, for each observation, I would like to check whether their X, Y and Z values are less than 12% different from another observation in the table. My data frame looks more or less like this:
ID | X | Y | Z | |
---|---|---|---|---|
1 | 33 | 3 | 1 | |
2 | 27 | 42 | 1 | |
3 | 32 | 3 | 58 | |
4 | 33 | 3 | 1 | |
5 | 25 | 2 | 22 | |
6 | 37 | 119 | 3 |
IDs 1 and 4 should be part of the same group as there is less than 12% of difference between their X, Y and Z coordinates. I was thinking of looping through the whole dataframe (looping through x to find similar rows, then through Y to find similar rows and finally through Z and then check if some results would appear through X, Y and Z, but it is awfully complicated and seems to not be working very well. I tried as well to group them by hand in such a way:
df %>% group_by(IDscrit) %>% summarize(n())
df$group <- ifelse(df$X <2 & df$Y <2 & df$Z <2, "group1",
ifelse(df$X <5 & df$X >2 & df$Y <5 &df$Y >2 & df$Z <5 & df$Z >2, "group2", NA))
But for very obvious reasons (not handy and I am excluding a lot of values of being grouped arbitrarily although they could be close to each other, like the for example if my table was displaying these values:
ID | X | Y | Z |
---|---|---|---|
14 | 4 | 3 | 1 |
26 | 6 | 3 | 2 |
they would not be grouped, although they should), therefore this cannot work either... I would be super grateful for any advice, really any... It is way above my abilities in coding and I don't even know by which part I should pick this problem anymore.
The following gives
max {abs(X_a - X_b), abs(Y_a - Y_b), abs(Z_a - Z_b)}
Output:
Now, you can form the groups you want, I guess.