I have to calculate the pairwise distances of two-dimensional points. These points are stored in a matrix, the first column containing the x values, the second column the y values. The distance function I need is not the euclidean metric though, but some custom metric.
For the euclidean distance, using the rdist()
function from the fields
package would give me what I want:
require(fields)
a = matrix(c(0,7,3,2,0,8,2,8), nrow=4)
b = matrix(c(2,6,2,6,2,2,6,6), nrow=4)
rdist(a,b)
[,1] [,2] [,3] [,4]
[1,] 2.828427 6.324555 6.324555 8.485281
[2,] 7.810250 6.082763 5.385165 2.236068
[3,] 1.000000 3.000000 4.123106 5.000000
[4,] 6.000000 7.211103 2.000000 4.472136
To use my own metric, I wrote a simple rdist()
replacement that calculates the distance of the points:
my_rdist <- function(a, b) {
a_len = dim(a)[1]
b_len = dim(b)[1]
distmatrix = matrix(data=NA, nrow=a_len, ncol=b_len)
for(i in seq(1,a_len)) {
for(j in seq(1,b_len)) {
distmatrix[i,j] = my_distance( a[i,], b[j,] )
}
}
return(distmatrix)
}
This also works as expected, but it's painfully slow. The my_rdist()
function takes about 20 minutes where the rdist()
function from the fields
package needs less than 2 seconds. (The custom metric at the moment just computes the square of the euclidean distance. The idea is to kind of penalize larger distances in the following processing of my data set.)
Are there any replacements for rdist()
I'm not aware of that can handle custom metric functions? Or can you provide me with any hints to speed up my my_rdist()
function? I'm pretty new to R
so perhaps I've made some obvious mistakes.