Preparing adjacency matrix - Filling missing links

55 Views Asked by At

I have a dataframe of this structure:

A <- data.frame(A = c("B",NA,NA,NA), 
                B = c("C","D",NA,NA), 
                C = c(NA,NA,NA,NA), 
                D = c("A",NA,NA,NA))

Where A,B,C and D are units of my network. My problem is, is that I have a link A -> B, but not B -> A, which is just not documented in the data that I have. Also B -> D, but not D -> B for example. I want to manipulate this dataframe (or matrix rather) such that each link shows up in each column. The dataframe should then look like this:

B <- data.frame(A = c("B","D",NA,NA), 
                B = c("A","C","D",NA), 
                C = c("B",NA,NA,NA), 
                D = c("A","B",NA,NA))

In my original data, I have around 68.000 units (columns) and around 30 documented (one sided) links (rows). So my dataframe now is quite large, and I cannot check whether each link is actually documented twice (i.e. B exists in column A, but A does not exist in column B etc.). Keep in mind, it is actually possible that in some cases both links are already documented, I cannot be sure.

I hope I could present my problem clearly. I'd be glad for any useful ideas.

Thanks in advance

3

There are 3 best solutions below

0
Maël On

Here's one solution in base R:

B <- sapply(seq_along(A), function(i){
  name <- names(which(colSums(A == names(A)[i], na.rm = TRUE) > 0))
  A[[i]][head(which(is.na(A[i])), length(name))] <- name
  sort(A[[i]], na.last = TRUE)
})

setNames(data.frame(B), names(A))

     A    B    C    D
1    B    A    B    A
2    D    C <NA>    B
3 <NA>    D <NA> <NA>
4 <NA> <NA> <NA> <NA>
2
ThomasIsCoding On

Here is an option with igraph

library(igraph)

g <- rev(na.omit(stack(A))) %>%
   graph_from_data_frame()

B <- list2DF(
   lapply(
      setNames(ego(g, mindist = 1), names(V(g)))[names(A)],
      \(x) `length<-`(names(x), nrow(A))
   )
)

which gives

> B
     A    B    C    D
1    B    A    B    A
2    D    D <NA>    B
3 <NA>    C <NA> <NA>
4 <NA> <NA> <NA> <NA>
2
tony13s On

thanks for both your answers. Regarding the igraph solution of @ThomasIsCoding: I tried applying your solution, unfortunately I get an error saying:

Error in list2DF(lapply(setNames(ego(g, mindist = 1), names(V(g)))[names(DF)],  : 
  all variables should have the same length
In addition: There were 50 or more warnings (use warnings() to see the first 50)".

The warning says:

1: In FUN(X[[i]], ...) : length of NULL cannot be changed
2: In `length<-`(names(x), nrow(DF)) : length of NULL cannot be changed

And so on. I don't quite understand it. After all, it is a dataframe, so the variables cant be of different length.