I recently used the kNN related functions, and found there are several powerful packages handling this issue. I tried 3 packages (BiocNeighbors, FNN, RANN) and want to find the nearest neighbors for each point. But finally, I found the result from BiocNeighbors using 'RcppAnnoy' gave different result in the 4th points. The last value in the 4th point should be 8, instead of 3 from BiocNeighbors result.
The reproducible code is below: `
set.seed(1234567)
cls_1_c1 <- rnorm(3, mean = 1, sd = 0.2)
cls_1_c2 <- rnorm(3, mean = 2, sd = 0.8)
cls_2_c1 <- rnorm(3, mean = 4, sd = 0.2)
cls_2_c2 <- rnorm(3, mean = 6, sd = 0.8)
cls_3_c1 <- rnorm(3, mean = 7, sd = 0.2)
cls_3_c2 <- rnorm(3, mean = 8, sd = 0.8)
dat <- cbind(c(cls_1_c1, cls_2_c1, cls_3_c1), c(cls_1_c2, cls_2_c2, cls_3_c2))
colnames(dat) <- c("c1", "c2")
dat <- as.data.frame(dat)
dat$name <- paste0("p", 1:9)
plot(x = dat$c1, y = dat$c2, xlab = "x", ylab = "y")
text(dat$c1, dat$c2, dat$name)
dat_mat <- as.matrix(dat[, c("c1", "c2")])
res_annoy <- BiocNeighbors::findKNN(dat_mat, k = 3, BNPARAM = AnnoyParam(ntrees = 1000))
print(res_annoy$index)
res_fnn <- FNN::knn.index(dat_mat, k = 3)
print(res_fnn)
res_rann <- RANN::nn2(data = dat_mat, query = dat_mat, k = 4)
print(res_rann$nn.idx[, -1])`
You can see the 4th row, result from print(res_annoy$index) are different from other 2 (they are same results).
` > print(res_annoy$index)
[4,] 5 6 3 *
> print(res_fnn)
[4,] 5 6 8 *
> print(res_rann$nn.idx[, -1])
[4,] 5 6 8 *
` Could you please help me figure out what is the possible reason for the above differences even though the input is same.
Thanks.
I tried the above script and I expect they are same results from all 3 methods.