How to get out-of-sample predictions for specific subgroups, in Stata?

168 Views Asked by At

I have a mixed effects logistic regression model:

quietly melogit y i.x1 i.x2 || x3:

Variable x1 is coded 0/1. I create the predicted probabilities for both values of x1:

margins x1

Then I obtain the predicted probabilities for each observation included in the model:

predict probhat if e(sample)
summarize probhat

To make out-of-sample predictions, I load my second dataset with the same variables:

use "C:\file path\newdata.dta", clear

Now I can get the predicted probabilities for each observation in the new dataset:

predict probhat_new
summarize probhat_new

My question is: How do I get what the 'margins' command created for the original dataset, but for the new dataset?

margins x1

Stata returns:

e(sample) does not identify the estimation sample

I also tried to recreate original output based on 'margins' by calculating the mean of probhat for each value of x1, hoping that I could use the same approach to get out-of-sample subgroup predicted probabilities:

summarize probhat if x1== 0, meanonly
scalar mean_probhat_x1_0 = r(mean)

gen mean_probhat=.
replace mean_probhat = mean_probhat_x1_0 if x1== 0
summarize mean_probhat

However, the mean based on this code is different from the mean for x1==0 based on the 'margins' command.

I also tried an alternative approach:

egen mean_probhat = mean(probhat), by(x1)
tab mean_probhat

But this also doesn't produce the correct results.

1

There are 1 best solutions below

0
On

You can use estimates esample: to reset the estimation sample; see help estimates esample. As the help file explains, you can easily amend the command to specify a subsample (e.g. those with non-missing values in specific variables) but here I'll just set the whole dataset as the estimation sample.

Minimum reproducible example:

webuse bangladesh, clear
qui: melogit c_use i.urban age i.children || district:
margins urban

webuse bangladesh, clear    // clears the estimation sample: e(sample) == 0 for all obs
estimates esample:    // resets estimation sample: e(sample) == 1 for all obs
margins urban