Truncated Normal MLE

222 Views Asked by At

I want to do a Maximum Likelihood estimation for a truncated normal. First I generate data with two simple Equations X_1 = X_0 + E_1 and X_2 = X_0 + E_2 and truncate them depending on x_1 > x_0. Here X_0 and E_2 are normal distributed and i use rnorm to generate the values, furthermore i use x_0 = mu + 2*sigma as a cut-off-point. After that i derive the estimates for x_1 with the "mledist" function and get really bad results. To fix the error I did a storage_mu with 100 different estimates for mu and they are all not good. I give the code i used for generating below.

m = 100
i = 1
storage_mu = rep(0,100)


while(i <= m){
#Parameters
N = 10000
mu = 10
sigma_0 = 1
sigma_e = 1
  
X_0 = rnorm(N, mean = mu, sd = sigma_0)
E_1 = rnorm(N, mean = 0, sd = sigma_e)
E_2 = rnorm(N, mean = 0, sd = sigma_e)
X_1 = X_0 + E_1
X_2 = X_0 + E_2
x_bi = cbind(X_1, X_2)
  
  
#Cut-Off and truncation
x_0_cut = mu + 2 * sqrt(sigma_0^2 + sigma_e^2)
x_bi_trunc = subset(x_bi, x_bi[,1] >= x_0_cut)
  
#Select n observations
#x_bi_sort = x_bi_trunc[order(x_bi_trunc[,1]),] 
n = 2500
x_bi_final = head(x_bi_trunc, n)
  
#for faster working
x_1 = x_bi_final[,1]
x_2 = x_bi_final[,2]
  
x_1_bar = mean(x_1) 
sd_1 = sqrt(var(x_1)*(n-1)/n)

  

#Berechnung von MLE für univariate trunc_norm
fit = mledist(x_1, "truncnorm", start = list(mean = x_1_bar, sd = sd_1))$estimate
mu_1_hat = fit[1]
sigma_1_hat = fit[2]

storage_mu[i] = mu_1_hat

i = i +1
}
storage_mu

After the code here is the output for our 100 estimators of mu. I have a bias of about +3.

[1] 13.36466 13.32620 13.26061 13.35483 13.33904 13.32613 13.35538 13.41398
[9] 13.34077 13.31515 13.36057 13.36290 13.37306 13.37546 13.35423 13.35764
[17] 13.35481 13.31707 13.35890 13.34394 13.44947 13.35364 13.36820 13.35658
[25] 13.36899 13.33603 13.33170 13.40991 13.35209 13.35108 13.36597 13.35709
[33] 13.29799 13.36056 13.33210 13.34038 13.35774 13.41122 13.32358 13.37611
[41] 13.34839 13.33610 13.30683 13.31361 13.37794 13.36822 13.42950 13.39687
[49] 13.38384 13.32224 13.32566 13.37058 13.29208 13.40263 13.32874 13.38417
[57] 13.31360 13.34144 13.39018 13.34612 13.35063 13.35206 13.37924 13.43540
[65] 13.33795 13.35608 13.32996 13.35488 13.32289 13.38497 13.33944 13.36007
[73] 13.36286 13.34323 13.34397 13.33800 13.38267 13.32556 13.35907 13.32954
[81] 13.38601 13.32422 13.38291 13.32262 13.39936 13.34364 13.39338 13.34509
[89] 13.30802 13.39126 13.35310 13.29634 13.38961 13.33402 13.31962 13.38288
[97] 13.33798 13.31185 13.30590 13.32078

I tried to get a better approximation by playing with the options for "mledist", for example i fixed the value for a to the cut-off point x_0 that i'm using but then the values get even worse. The function looks like this

mledist(x_1, "truncnorm", start = list(mean = x_1_bar, sd = sd_1), fix.arg = list(a=x_0_cut))$estimate

with these results:

[1]  10.140061223  12.089945831  11.048595750  11.581995517  11.129860901
[6]   9.418521020   9.608019087   9.929831748   4.699461364  11.267935812
[11]  11.714949354  11.762986025  10.202113218  12.369248730  11.844003279
[16]   8.687867353  11.314833612  11.690386639   4.494877436   8.705778578
[21]  11.764867122   9.981540165  11.328531141   9.703258348  11.655422495
[26]  12.073462623   8.681263340  11.024820357  12.068915824  10.770174316
[31]   5.468735578   9.840865977  12.186883743   8.993979680   5.369682551
[36]   3.479905781  11.499858120   8.020687499  11.853293176  10.506828481
[41]  10.231765273  10.511779270  10.411593933  11.773604687   9.777336311
[46]  10.853945953  11.327343019  11.089845843   2.991276391   0.151239390
[51]   9.489660148  11.726098700  11.509123940   7.453213566  11.932400933
[56]   4.188084979  10.977560797  -0.007370121  10.904831509   6.100907337
[61]  11.727064157  10.029395630   5.662605692   9.946740799  12.359169774
[66]  -2.686121867  10.387318263  11.170782824   9.334853648   9.839261227
[71]   7.703257871   8.971979528   8.753366471  12.486250630  11.729089349
[76]   7.663378529  -1.581499286  10.500557773   9.504805508 -15.956289261
[81] -13.784320458   9.248311024  11.267769705   9.608263889   8.128767660
[86]  11.186614967  10.300034199  10.912469218  10.975790552  10.304912099
[91]   8.654392575   7.547824160  11.879473071  11.379844872   8.978183752
[96]  11.564179091  10.771436935  11.132971477   9.125936329  -7.406405917

If anyone has the time and motivation to help me it would be very much appreciated !

Regards terrabowing

1

There are 1 best solutions below

5
On

Truncated normal MLE using optim:

tnorm.mle <- function(x, a = -Inf, b = Inf) {
  # parameter estimation for truncated normal data with known extents
  
  n <- length(x)
  ab <- c(a, b)
  init <- c(x[1], log(sd(x)))
  
  if (is.finite(a)) {
    if (is.finite(b)) {
      # two-sided truncation
      f <- function(params) {
        mu <- params[1]
        sigma <- exp(params[2])
        n*log(diff(pnorm(ab, mu, sigma))) - sum(dnorm(x, mu, sigma, TRUE))
      }
    } else {
      # left-truncated
      f <- function(params) {
        mu <- params[1]
        sigma <- exp(params[2])
        n*pnorm(a, mu, sigma, FALSE, TRUE) - sum(dnorm(x, mu, sigma, TRUE))
      }
    }
  } else {
    if (is.finite(b)) {
      # right-truncated
      f <- function(params) {
        mu <- params[1]
        sigma <- exp(params[2])
        n*pnorm(b, mu, sigma, TRUE, TRUE) - sum(dnorm(x, mu, sigma, TRUE))
      }
    } else {
      # non-truncated normal
      return(c(mu = mean(x), sigma = sd(x)))
    }
  }
  
  # solve for mu and sigma
  params <- optim(init, f)$par
  c(mu = params[1], sigma = exp(params[2]))
}

Testing:

set.seed(1288276868)

a <- 4
b <- 10
mu <- 2
sigma <- 3
x <- qnorm(runif(1e6, pnorm(a, mu, sigma), pnorm(b, mu, sigma)), mu, sigma)
tnorm.mle(x, a, b)
#>       mu    sigma 
#> 2.034619 2.987204

x <- qnorm(runif(1e6, pnorm(a, mu, sigma)), mu, sigma)
tnorm.mle(x, a)
#>       mu    sigma 
#> 2.012977 2.998484

x <- qnorm(runif(1e6, 0, pnorm(b, mu, sigma)), mu, sigma)
tnorm.mle(x, b = b)
#>       mu    sigma 
#> 1.999469 3.000672