Fitting discrete (negative binomial) distribution for early data values

371 Views Asked by At

I'm having some difficulties with fitting a discrete distribution function (I'm specifically using the negative binomial distribution). Here's my setting: I have a source of incoming items, each with an unknown lifetime. Everyday, some expire (a big portion in the first day, some more in the second day, etc.). For an existing source of incoming items (source is older than 180 days), I've managed to model the lifespan of a new item with the negative binomial distribution to an acceptable error (using MLE - Maximum Likelihood Estimation).

My problem starts with new sources of incoming items. I want to estimate their items' lifetime distribution after a short time (say, after 5-7 days). When I try to apply the MLE, I get significantly lower means (i.e. 3 instead of 30). I assume it's because the MLE can't understand the last day's (7th day) mass is actually the 1-CDF(6) (cumulative distribution function of the previous 6 days) and actually contains living items as well.

Is there a good approach to fit the discrete distribution only based on the early data values and the sum of the mass of the other values? I could write some optimization function for it and only give weight to the 6 previous days, but I feel it will give me sub-optimal performance.

I'm ok with theory explanation, but if you can address specific functions or libraries, I can work in Matlab, R, Python and C#.

1

There are 1 best solutions below

1
On

The problem you have encountered is called "censored" data. Essentially you at a certain only that the lifetime of some items is greater than (now minus start time). Your guess about how to correct the likelihood function is pointing in the right direction. I think censored data are usually considered in texts about survival analysis. The Wikipedia article [1] has some brief remarks about censored data that might help too.

There is a package for survival analysis in R named 'survival'. There may be other R packages. Dunno about packages for other systems.

[1] http://en.wikipedia.org/wiki/Survival_analysis