I have a mixed effects logistic regression model:
quietly melogit y i.x1 i.x2 || x3:
Variable x1 is coded 0/1. I create the predicted probabilities for both values of x1:
margins x1
Then I obtain the predicted probabilities for each observation included in the model:
predict probhat if e(sample)
summarize probhat
To make out-of-sample predictions, I load my second dataset with the same variables:
use "C:\file path\newdata.dta", clear
Now I can get the predicted probabilities for each observation in the new dataset:
predict probhat_new
summarize probhat_new
My question is: How do I get what the 'margins' command created for the original dataset, but for the new dataset?
margins x1
Stata returns:
e(sample) does not identify the estimation sample
I also tried to recreate original output based on 'margins' by calculating the mean of probhat for each value of x1, hoping that I could use the same approach to get out-of-sample subgroup predicted probabilities:
summarize probhat if x1== 0, meanonly
scalar mean_probhat_x1_0 = r(mean)
gen mean_probhat=.
replace mean_probhat = mean_probhat_x1_0 if x1== 0
summarize mean_probhat
However, the mean based on this code is different from the mean for x1==0 based on the 'margins' command.
I also tried an alternative approach:
egen mean_probhat = mean(probhat), by(x1)
tab mean_probhat
But this also doesn't produce the correct results.
You can use
estimates esample:
to reset the estimation sample; seehelp estimates esample
. As the help file explains, you can easily amend the command to specify a subsample (e.g. those with non-missing values in specific variables) but here I'll just set the whole dataset as the estimation sample.Minimum reproducible example: