Fill values ​from one dataframe with another dataframe

50 Views Asked by At

I have some problems with some processing that I must do in an R environment in Databricks. I have two dataframes.

df1: contains the rows and columns (index1 and index2) and the value to be assigned

index1 index2 value
142342315 1423423341 A
142342315 14234235325 A
1423423166 14234235325 C
1423423166 1423423341 C
... ... ...

df2: generated by an array with all combinations of index

rowsname 142342315 1423423166 1423423341 14234235325
142342315 NA NA NA NA
1423423166 NA NA NA NA
1423423341 NA NA NA NA
14234235325 NA NA NA NA

The first thing that came to mind is to make a loop to assign the value to each location in the dataframe. Something like that:

for (i in 1:nrow(df1)) {
    df2[df1$index_1[i], df1$index_2[i]] <- df1$temp[i]
    df2[df1$index_2[i], df1$index_1[i]] <- df1$temp[i]
}

But, the problem is that it involves a large number of combinations, several hours of execution.

I tried using sparseMatrix but my indices cannot be converted to integers in this case. Can you think of a way to do it optimally?

I tried for loops, it's so slowly. I tried using sparseMatrix but my indices cannot be converted to integers in this case.

1

There are 1 best solutions below

2
BigFinger On

If I understand it correctly, you are looking to produce a square matrix with all names across both dimensions. I think your initial approach should work, if you preallocate the output matrix and replace the for loop with some vectorized version. How about the following?

df1 = data.frame(
    index1=c("142342315", "142342315", "1423423166", "1423423166"),
    index2=c("1423423341", "14234235325", "14234235325", "1423423341"),
    value=c("A","A","C","C")
)

rowsname = unique(c(df1$index1,df1$index2))
res = matrix(nrow=length(rowsname),ncol=length(rowsname),dimnames=list(rowsname,rowsname))

lapply(1:nrow(df1), function(r) {
    res[df1$index1[r], df1$index2[r]] <<- df1$value[r]
    res[df1$index2[r], df1$index1[r]] <<- df1$value[r]
})

Content of variable "res":

            142342315 1423423166 1423423341 14234235325
142342315   NA        NA         "A"        "A"        
1423423166  NA        NA         "C"        "C"        
1423423341  "A"       "C"        NA         NA         
14234235325 "A"       "C"        NA         NA