MAP estimate versus Mean of Posterior Disagree

15 Views Asked by At

Shouldn't the MAP estimate be close to the center of the histogram of the trace samples?

I see close agreement between the find_MAP estimate and the histogram of the trace variables when modeling my simulated data. But I'm seeing wild differences with the real data. All the distributions appear to be unimodal, and there are no divergences in the sampling. I know they don't have to be the same, but everything else in this model is behaving well, just the trace samples seem to be offset. Perhaps 2x the standard deviation of the samples.

I've built a model that connects cross-country runner times to individual runner abilities and course difficulties. There are also linear factors that account for the improvement runners see from year to year, and from month-to-month in the season. I've got plenty of data. Clearly the real world is messier than my simulated data.

I can't share the data, but the simulated data and the model are on Github.

What can I do to debug this model (and the real data)? Except for one parameter (the constant term) the r_hat's are all less than 1.1. Do any of the other numbers looks suspicious?

I'm flummoxed. Thanks for any suggestions.

-- Malcolm

                mean    sd    hdi_3%    hdi_97% mcse_mean   mcse_sd ess_bulk    ess_tail    r_hat
bias           393.221  9.591   378.139 411.427 3.885   2.908   8.0 49.0    1.20
monthly_slope   -7.383  0.536   -8.434  -6.440  0.037   0.027   214.0   1150.0  1.01
yearly_slope    -6.370  0.518   -7.323  -5.361  0.043   0.031   147.0   726.0   1.02
course_est[0]   1.521   0.074   1.379   1.653   0.019   0.014   15.0    115.0   1.09
course_est[1]   1.403   0.052   1.316   1.507   0.019   0.014   8.0 80.0    1.19
... ... ... ... ... ... ... ... ... ...
runner_est[1936]    1.583   0.088   1.409   1.738   0.003   0.002   981.0   1434.0  1.01
runner_est[1937]    1.616   0.041   1.539   1.692   0.008   0.006   25.0    505.0   1.05
runner_est[1938]    1.546   0.030   1.488   1.600   0.008   0.006   16.0    95.0    1.08
runner_est[1939]    1.790   0.094   1.604   1.954   0.005   0.003   400.0   1196.0  1.01
eps 5.394   0.045   5.313   5.480   0.001   0.001   1498.0  1457.0  1.00
2270 rows × 9 columns
1

There are 1 best solutions below

0
Malcolm Slaney On

This is a partial answer. Don't trust point estimates, like MAP.

https://discourse.pymc.io/t/how-to-reference-posterior-mode-value-without-find-map/3632