Normalized degree centrality measures with isolated vertices/nodes (r igraph)

834 Views Asked by At

I am quite new to network analysis. I want to compute normalized centrality measures (degree, betweenness, and eigenvector) with R. I created the following edgelist where both ID1 and ID2 are investors. If both ID1 and ID2 are populated, investors co-invest, otherwise, the investor invests alone (i.e. isolated node):

edgelist <- structure(list(ID1 = c("Cottonwood Capital Partners LLC", "Sequoia Capital Operations LLC", 
                   "Seraphim Capital (General Partner) LLP", "Seraphim Capital (General Partner) LLP", 
                   "Providence Equity Partners LLC", "Turn8", "Matrix Partners LP", 
                   "Zeeuws Investeringsfonds BV", "Venionaire Capital GmbH", "CincyTech", 
                   "First Round Capital", "Matrix Partners LP", "Mohr Davidow Ventures", 
                   "Esprit Capital Partners LLP", "Yaletown Venture Partners", "Wellington Partners", 
                   "Charles River Ventures LLC", "MB VENTURE PARTNERS L L C", "Edison Partners", 
                   "Ballast Point Venture Partners LLC", "Arcview Group", "Foundry Group LLC", 
                   "Sosventures LLC", "Vantagepoint Management Inc", "Bain Capital Venture Partners LLC", 
                   "NAV VC", "Bluerun Ventures LP", "Draper Fisher Jurvetson International Inc", 
                   "Claremont Creek Ventures LP", "Meritage Funds"), ID2 = c("Pangaea Ventures Ltd", 
                                                                             NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "RPM Ventures Management LLC", 
                                                                             NA, "Kennet Partners Ltd", NA, NA, "Gotham Ventures LLC", NA, 
                                                                             NA, NA, "Syncom Management Co Inc", "Lightspeed Venture Partners China Co Ltd", 
                                                                             NA, "First Round Capital", NA)), row.names = c(NA, -30L), class = c("tbl_df", 
                                                                                                                                                 "tbl", "data.frame"))

To compute normalized degree centralities, I use the following code:

library(igraph)
net <- graph.data.frame(edgelist, directed = F) # create undirected graph

"Warning message:
In graph.data.frame(edgelist, directed = F) :
In `d' `NA' elements were replaced with string "NA""

degree_norm <- degree(net, mode = "all", normalized = T) # retreive normalized degree measure
betw_norm <- betweenness(net2, directed=F, normalized = T) # retreive normalized betw measure
ev <- eigen_centrality(net2, directed = F, scale=F, weights = NULL) # retreive normalized ev

As you can see a Warning message appears: It considers the NA as a separate investor (it turns NA a string). The reason why I keep isolated investors is that the normalization requires dividing the raw degree measure (that one computed with connected nodes/investors) by the possible number of investors to whom any investor could have invested (i.e. all investors possible, counting those that invest alone).

Any suggestion on how to circumvent such problem? I tried to work with adjacency matrices, but could not figure it out either...

Thanks a lot!

1

There are 1 best solutions below

2
On

When using igraph, the correct way of dealing with nodes that have no connection to other nodes is passing in a list of vertices. Using the data you provided:

# first get a list of all the nodes, excluding the NA
nodeslist <- data.frame(name= na.omit(unique(c(edgelist$ID1,edgelist$ID2 ))))

# delete the NAs from the edge list
edgelist <- na.omit(edgelist)

# create the `igraph` object
g <- graph_from_data_frame(edgelist,
                           directed = F,
                           vertices = nodeslist)

# compute normalized degree
V(g)$degree <- igraph::degree(g, mode = "all", normalized = T)

# explore the result: there are 34 nodes, and the (not-normalized) degree of all nodes
# is either 0 or 1. Therefore, the normalized degree should be either 1/33=0.03030303 or 0:
V(g)$degree
 [1] 0.03030303 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 [8] 0.00000000 0.00000000 0.03030303 0.00000000 0.00000000 0.00000000 0.00000000
[15] 0.03030303 0.00000000 0.03030303 0.00000000 0.00000000 0.03030303 0.00000000
[22] 0.00000000 0.00000000 0.03030303 0.03030303 0.00000000 0.03030303 0.00000000
[29] 0.03030303 0.03030303 0.03030303 0.03030303 0.03030303 0.03030303