I'm trying to create a mixed effects model for some data I'm analyzing. It previously worked as a fixed effects before I decided to change one of the variables (countryfactor) to a random effects (random intercept) variable. The issue is that when I run it I get the following message:
"Error: Invalid grouping factor specification, countryfactor".
I've seen on other posts that this is usually an issue with there being NA entries, but I've checked all the variables in my model and none have any NA entries.
Does anyone know what might be causing this error message? Posted the model code below.
glmer(
formula =
as.numeric(wheezing_InD) ~
as.factor(mainfuel) +
age_InD +
as.factor(gender_InD) +
as.factor(school_level_InD3) +
as.factor(enough_money_InD) +
as.factor(cooking_location_InD) +
as.factor(other_smokers_household_InD) +
as.factor(AnyCondition) +
as.factor(owned_items_electricity_connection_R) +
as.factor(HealthAdviceFull) +
(1|countryfactor),
family=poisson(link="log"),
data = Data46)
update
Tried with a simpler model, with just the first 20 rows and the following 3 columns.
glmer(
formula =
as.numeric(wheezing_InD) ~
age_InD +
(1|countryfactor),
family=poisson(link="log"),
data = Data46)
Still have the same error code. Here is a sample of the first 20 rows with these 3 variables, using dput:
structure(list(wheezing_InD = c("No", "Yes", "Yes", "No", "No",
"No", "No", "No", "Yes", "No", "No", "No", "No", "Yes", "No",
"No", "Yes", "Yes", "Yes", "No"), age_InD = c(55L, 24L, 23L,
30L, 40L, 43L, 37L, 38L, 18L, 23L, 28L, 33L, 27L, 54L, 23L, 23L,
42L, 48L, 31L, 18L), countryfactor = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("cameroon", "ghana", "kenya"), class = "factor")), row.names = c(NA,
20L), class = "data.frame")
Have also attached the str version too:
'data.frame': 20 obs. of 3 variables:
$ wheezing_InD : chr "No" "Yes" "Yes" "No" ...
$ age_InD : int 55 24 23 30 40 43 37 38 18 23 ...
$ countryfactor: Factor w/ 3 levels "cameroon","ghana",..: 1 1 1 1 1 1 1 1 1 1 ...
If this is really what your data look like (i.e. the response variable
wheezing_InD
is a character vector and not a factor) thenas.numeric(wheezing_inD)
will convert the entire response vector to NAs ... admittedlylme4
could provide a more informative error message here ...Binomial responses can be specified in most R modeling functions very flexibly (I would say too flexibly).
Let's consider your options:
wheezing_inD
alone will give an error (it's a character, which isn't in the allowed set)as.factor(wheezing_inD)
orfactor(wheezing_inD)
should work fine (option 1 above: the model will estimate the proportion of "Yes" values, since R will use alphabetical order to make "No" the first level and "Yes" the secondas.numeric(factor(wheezing_inD))-1
is OK,as.numeric(as.factor())
converts ("No", "Yes") to (1,2) and subtracting 1 gives (0,1). (This is option 2, we don't need "weights" because we only have 1 'trial' per observation (Bernoulli/binomial with n=1).Option 3 is really only relevant for binomial data with N>1.
as_numeric(factor(wheezing_inD))
seems weird to me as it will result in (1,2) responses, which should give you an error?