Anova of node-level Network Data in R

Question

Anova of node-level Network Data in R

45 Views Asked by AnCo At 17 September 2023 at 23:59

I want to perform an Anova test on my social network data to examine if a node categorical attribute is associated with the node's network constraint (continuous variable).

Standard T‐test, Anova and linear regression analysis are not appropriate for node‐level data because the aggregated measures for each node (i.e., people surveyed) are not independent of one another. This contravenes one of the basic assumptions of linear regression analysis. Therefore, we use a permutation test to generate the significance level, which accounts for the non‐independence of the cases.

I know that this analysis can be done using the UCINET program, but I would like to use R as I am doing the rest of my analysis in R.

As far as I understand, I would need to permutate the graph first, then measure the node constraints, and then calculate the anova's of all permutated networks.

Would anyone know how I could do this in R?

Replicable Sample

Examine if being a certain type is associated with having less constraint.

g2 <- make_graph(~  A --+ B:D, B --+ D:F, C--+B , D--+A:B:F:E, E--+ D, F--+ A:E:G, G--+ F)
type<-c("one","two","three","one","two","three","one")
vertex_attr(g2,"type")<-type
constraint<-igraph::constraint(g2)
g2_N<-intergraph::asNetwork(g2)

Original Q&A

There are 1 best solutions below

**AnCo** · Answer 1 · 2023-09-22T18:54:54.397000

I found two potential approaches following the instructions from Hobson et. al. (2021):

Linear Regression

adj_matrix<-as_matrix(g2) 
obs<-coef(lm(constraint~type))[2:4] 
  

  reference_table <- data.frame(
    Column1 = numeric(9999),     # Empty numeric column
    Column2 = numeric(9999),     # Empty numeric column
    Column3 = numeric(9999))
  
  colnames(reference_table)<-c("one","two","three")
  
  MAT_T<-sna::rmperm(adj_matrix)
  for(i in 1:9999){
    cp<-igraph::constraint(as_igraph(MAT_T))
    reference_table[i,]<-coef(lm(cp~type))[2:4]
    MAT_T<-sna::rmperm(MAT_T)
  }

  reference_table2<-rbind(obs,reference_table)
  
colnames(reference_table2)<-c("one","two","three")
  
  #p-values
  p1<-sum(obs[1]<reference_table2$one)/length(reference_table2$one)
  p2<-sum(obs[2]<reference_table2$two)/length(reference_table2$two)
  p3<-sum(obs[3]<reference_table2$three)/length(reference_table2$three)
  
  
  #histogram
  par(
    xpd=FALSE,
    mfrow=c(2,2)
  )
  hist(reference_table$one,las=1,xlim=c(-1,1),col="grey",border=NA,main="Reference distribution type one",xlab="Test statistic values",cex.lab=1.5)
  lines(x=c(obs[1],obs[1]),y=c(0,5000),col="red",lwd=4)
lines(x=rep(quantile(reference_table2$one,0.025),2),y=c(0,5000),col="darkblue",lwd=2,lty=2) lines(x=rep(quantile(reference_table2$one,0.975),2),y=c(0,5000),col="darkblue",lwd=2,lty=2)
  
  hist(reference_table$two,las=1,xlim=c(-1,1),col="grey",border=NA,main="Reference distribution type two",xlab="Test statistic values",cex.lab=1.5)
  lines(x=c(obs[2],obs[2]),y=c(0,5000),col="red",lwd=4) lines(x=rep(quantile(reference_table2$two,0.025),2),y=c(0,5000),col="darkblue",lwd=2,lty=2)
lines(x=rep(quantile(reference_table2$two,0.975),2),y=c(0,5000),col="darkblue",lwd=2,lty=2)
  
  hist(reference_table$three,las=1,xlim=c(-1,1),col="grey",border=NA,main="Reference distribution type three",xlab="Test statistic values",cex.lab=1.5)
  lines(x=c(obs[3],obs[3]),y=c(0,5000),col="red",lwd=4)
lines(x=rep(quantile(reference_table2$three,0.025),2),y=c(0,5000),col="darkblue",lwd=2,lty=2)
lines(x=rep(quantile(reference_table2$three,0.975),2),y=c(0,5000),col="darkblue",lwd=2,lty=2)

ANOVA I tried to follow a similar approach but for an Anova test and not a linear regression. However, I was not sure which observation value to choose to compare the observations as it seems that using the p-value does not make 100% sense.

#First we plot the relationship


# calculate constraint in the observed network
obs<-constraint

#We then choose our test statistic. do anova 
model.aov<-aov(obs~type)
summary_result <- summary(model.aov)
p_value <- summary_result[[1]]$`Pr(>F)`[1] # is this a good choice? 

#conduct permutation
reference<-numeric()
MAT_T<-sna::rmperm(adj_matrix)

for(i in 1:999){
  cp<-igraph::constraint(as_igraph(MAT_T))
  model.aov<-aov(cp~ type)
  summary_result <- summary(model.aov)
   
  reference[i]<-summary_result[[1]]$`Pr(>F)`[1]
  MAT_T<-sna::rmperm(MAT_T)
}
reference2<-c(p_value,reference)

#We can then calculate a p value by comparing the observed p-value to those in the reference dataset. 

permutation_p<-sum(p_value<reference2)/length(reference2)

#histogram
par(xpd=FALSE)
hist(reference,las=1,xlim=c(0,1),col="grey",border=NA,main="Reference distribution",xlab="Test statistic values",cex.lab=1.5)
lines(x=c(p_value,p_value),y=c(0,5000),col="red",lwd=4)
lines(x=rep(quantile(reference2,0.025),2),y=c(0,5000),col="darkblue",lwd=2,lty=2)
lines(x=rep(quantile(reference2,0.975),2),y=c(0,5000),col="darkblue",lwd=2,lty=2)

Anova of node-level Network Data in R

There are 1 best solutions below

Related Questions in R

Related Questions in SNA

Trending Questions

Popular # Hahtags

Popular Questions