R ergm - specifying nodematch effect on two attributes

437 Views Asked by At

I am currently working on social network data with R ergm package. I want to estimate the conditional probability of a tie who is homophilic on two different variables, but depending on how I specify the model the results are slightly different.

In the first case, I put two nodematch terms in my model, one for each variable that interest me, and I find the conditional log-odd of a doubly-homophilic tie by summing the 3 coefficients of my model (the "edge" terms and the two nodematch terms).

In the second case, I directly specify only one nodematch term, for ties homophilic on both variables.

And the results I get, though close, are still different, while in both cases I should get the log-odd of a tie occurring between individuals sharing both these attributes.

Here is an example from the Sampson data:

# Load the data :

library(statnet)

data(sampson)

#First model: I specify two nodematch terms, one for 'cloisterville' and one for 'group'.

m1 <- ergm(samplike ~ edges + nodematch('cloisterville') + nodematch('group'))

#Second model: this time, I have only one term asking for a `nodematch` on both terms at the same time.

m2 <- ergm(samplike ~ edges + nodematch(c('cloisterville','group')))

#Here is the output of both models:

summary(m1)

summary(m2)

So according to the first model, conditional log-odd of a homophilic tie on both variables should be:

-2.250 + 0.586 + 2.389

That is, 0.725

However, according to the second model, the log-odd of this same doubly homophilic tie should be:

-1.856 + 2.659

That is, 0.803

Corresponding probabilities are 0.6737071 and 0.6906158

Do you know why the results are different in both cases, whereas it should give the same conditional probability of the same kind of tie?

Thank you so much for your help,

Kind regards

Timothée

1

There are 1 best solutions below

0
paqmo On

We should not expect the same results, since the models are evaluating two different things. In essence, model 1 is evaluating homophily on cloisterville or on group, while model 2 is evaluating homophily on both cloisterville and group.

To be more precise, the first model tests homophily on group, net the tendency toward homophily on cloisterville, and vice versa. The second model looks at whether there is a tendency toward homophily on both attributes at the same time. Do monks form ties within groups and based on their location in the cloisters?

See the note in ?ergm.terms for nodematch:

(When multiple names are given, the statistic counts only those on which all the named attributes match.)

This is easy to see visually:

enter image description here

The colors are groups. Squares means cloisterville==TRUE and triangles means cloisterville==FALSE. The term nodematch(c('cloisterville','group')) counts only those edges where colors and shapes match!