hierarchical tree with NA: how to create edge for igraph

32 Views Asked by At

I have a dataframe with in each row a taxon, and each column is a level of description of the taxon. I need to transform this dataframe into two dataframes, one of nodes and one of edges, to be used in igraph. Not all taxon levels, i.e. columns, are complete, i.e. there are NAs. An example extrapolated from my data is this:

object_taxonomy = data.frame(class = c("Insecta", "Insecta", "Insecta", "Insecta", 
"Insecta", "Insecta", "Insecta", "Insecta", "Insecta", "Insecta"
), order = c("Hymenoptera", "Hemiptera", "Hemiptera", "Hemiptera", 
"Hemiptera", "Hemiptera", "Hemiptera", "Hymenoptera", "Hemiptera", 
"Hemiptera"), superfamily = c("Chalcidoidea", "Coccoidea", "Coccoidea", 
"Coccoidea", "Coccoidea", "Coccoidea", NA, "Chalcidoidea", "Coccoidea", 
"Coccoidea"), family = c("Azotidae", "Diaspididae", "Diaspididae", 
"Diaspididae", "Diaspididae", "Diaspididae", "Margarodidae", 
"Encyrtidae", "Diaspididae", "Diaspididae"), subfamily = c(NA, 
NA, NA, NA, NA, NA, NA, "Encyrtinae", NA, NA), tribe = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    genus = c("Ablerus", "Chionaspis", "Diaspidiotus", "Diaspidiotus", 
    "Diaspidiotus", "Lepidosaphes", "Kuwania", "Lakshaphagus", 
    "", "Diaspidiotus"), scientificName = c("Ablerus celsus", 
    "Chionaspis salicis", "Diaspidiotus ostreaeformis", "Diaspidiotus perniciosus", 
    "Diaspidiotus prunorum", "Lepidosaphes ulmi", "Kuwania rubra", 
    "Lakshaphagus merceti", "", "Diaspidiotus gigas"))

Creating a data.frame of nodes is no problem, with gather and filtering on NA. Instead, I am looking for help in creating the edges data.frame, in which NA is skipped when present.

What I would like to achieve is this:

From To
Insecta Hymenoptera
Insecta Hemiptera
Hymenoptera Chalcidoidea
Hemiptera Coccoidea
Chalcidoidea Azotidae
Chalcidoidea Encyrtidae
Hymenoptera Chalcidoidea
Hemiptera Coccoidea
Coccoidea Diaspididae
Hemiptera Margarodidae
Encyrtidae Encyrtinae
Azotidae Ablerus
Encyrtinae Lakshaphagus
Diaspididae Chionaspis
Diaspididae Diaspidiotus
Diaspididae Lepidosaphes
Margarodidae Kuwania
Ablerus Ablerus celsus
Lakshaphagus Lakshaphagus merceti
Chionaspis Chionaspis salicis
Diaspidiotus Diaspidiotus ostreaeformis
Diaspidiotus Diaspidiotus perniciosus
Diaspidiotus Diaspidiotus prunorum
Lepidosaphes Lepidosaphes ulmi
Kuwania Kuwania rubra
1

There are 1 best solutions below

1
ThomasIsCoding On

Here are two options

  • base R approach
`row.names<-`(
   unique(
      do.call(
         rbind,
         apply(object_taxonomy, 1, \(x) {
            v <- Filter(nchar, na.omit(x))
            data.frame(from = v[-length(v)], to = v[-1])
         })
      )
   ), NULL
)

and you can achieve

           from                         to
1       Insecta                Hymenoptera
2   Hymenoptera               Chalcidoidea
3  Chalcidoidea                   Azotidae
4      Azotidae                    Ablerus
5       Ablerus             Ablerus celsus
6       Insecta                  Hemiptera
7     Hemiptera                  Coccoidea
8     Coccoidea                Diaspididae
9   Diaspididae                 Chionaspis
10   Chionaspis         Chionaspis salicis
11  Diaspididae               Diaspidiotus
12 Diaspidiotus Diaspidiotus ostreaeformis
13 Diaspidiotus   Diaspidiotus perniciosus
14 Diaspidiotus      Diaspidiotus prunorum
15  Diaspididae               Lepidosaphes
16 Lepidosaphes          Lepidosaphes ulmi
17    Hemiptera               Margarodidae
18 Margarodidae                    Kuwania
19      Kuwania              Kuwania rubra
20 Chalcidoidea                 Encyrtidae
21   Encyrtidae                 Encyrtinae
22   Encyrtinae               Lakshaphagus
23 Lakshaphagus       Lakshaphagus merceti
24 Diaspidiotus         Diaspidiotus gigas
  • igraph approach You can try adding path by rows in object_taxonomy
library(igraph)

# remove `NA` or `""` from vertices
nodes <- Filter(nchar, unique(na.omit(unlist(object_taxonomy))))

# initialize graph by vertex number and then add paths iteratively
make_empty_graph(n = length(nodes)) %>%
   set_vertex_attr("name", value = nodes) %>%
   Reduce(`+`,
      apply(object_taxonomy, 1, \(x) path(Filter(nchar, na.omit(x)))),
      init = .
   ) %>%
   simplify() %>%
   as_data_frame()

and you will achieve

           from                         to
1       Insecta                Hymenoptera
2       Insecta                  Hemiptera
3   Hymenoptera               Chalcidoidea
4     Hemiptera                  Coccoidea
5     Hemiptera               Margarodidae
6  Chalcidoidea                   Azotidae
7  Chalcidoidea                 Encyrtidae
8     Coccoidea                Diaspididae
9      Azotidae                    Ablerus
10  Diaspididae                 Chionaspis
11  Diaspididae               Diaspidiotus
12  Diaspididae               Lepidosaphes
13 Margarodidae                    Kuwania
14   Encyrtidae                 Encyrtinae
15   Encyrtinae               Lakshaphagus
16      Ablerus             Ablerus celsus
17   Chionaspis         Chionaspis salicis
18 Diaspidiotus Diaspidiotus ostreaeformis
19 Diaspidiotus   Diaspidiotus perniciosus
20 Diaspidiotus      Diaspidiotus prunorum
21 Diaspidiotus         Diaspidiotus gigas
22 Lepidosaphes          Lepidosaphes ulmi
23      Kuwania              Kuwania rubra
24 Lakshaphagus       Lakshaphagus merceti