Calculate large number of permutations in R

759 Views Asked by At

I have 2 large dataframes in R, both with circa 100k rows, which hold lists of geo coordinates (lat/ long). I am looking to iterate across them getting all combinations between items and thereafter, applying a function to it.

Because the number of combinations is around 11 billion (11 x 1.000.000.000), my original idea of using a loop is not applicable.

The dataframes would resemble something like:

A<-as.data.frame(cbind(rbind(-0.1822,-0.4419,0.2262),rbind(51.5307,51.4856,51.4535)))

(...)
<!-- -->

V1 . V2

-0.1822 . 51.5307 

-0.4419 . 51.4856

 0.2262 . 51.4535

B<- as.data.frame(cbind(rbind(-0.4764,-0.2142,-0.2197),rbind(51.5221,51.4593,51.5841))) 
(...)
<!-- -->

V1 . V2

-0.4764 . 51.5221

-0.2142 . 51.4593

-0.2197 . 51.5841

I would like the output to look like:

V1a .   V2a .   V1b .   V2b


-0.1822 . 51.5307 . -0.4764 . 51.5221  

-0.4419 . 51.4856 . -0.4764 . 51.5221

 0.2262 . 51.4535 . -0.4764 . 51.5221

-0.1822 . 51.5307 . -0.2142 . 51.4593

-0.4419 . 51.4856 . -0.2142 . 51.4593

(...)

Another post here in stackoverflow ([a link]Calculating great-circle distance matrix ) suggests using:

apply(A, 1, FUN=function(X) distHaversine(X, B))

However, I suspect that the matrix created is too large for it to complete the calculations.

Any ideas on how to solve this efficiently? Keeping in mind that my objective is thereafter to apply the Haversine function to calculate distances between the points.

Thanks J

2

There are 2 best solutions below

4
On BEST ANSWER
cmb<-expand.grid(1:nrow(A),1:nrow(B))
cbind(A[cmb[,1],],B[cmb[,2],])

Unlike Andre's solution, this won't create combinations of the columns within each of A and B (his creates 81 rows, whereas for this sample, only 9 are desired). Not sure if this will work for your larger dataset, though.

4
On

What you want is:

# expand.grid(A$V1,A$V2,B$V1,B$V2)
expand.grid(cbind(A,B))

but as you have figured out, the result will be very huge, so I'm not sure if your code will run.