how to separate two distributions from a pdf(probability density function)?

870 Views Asked by At

Assume the pdf(probability density function) of my dataset is as below, there are two distributions in the dataset and I want to fit them with a Gamma and a Gaussian distribution.

enter image description here

I have read this but this applies only for two Gaussian distribution. How to use python to separate two gaussian curves?

Here is the steps that I would like to do

  1. manually estimate the mean and variance of the Gaussian distribution
  2. base on estimated mean and variance, create the pdf of the Gaussian distribution
  3. substract the pdf of Gaussian from the original pdf
  4. fit the pdf to a Gamma distribution

I am able to do 1~3, but for step 4 I do not know how to fit a Gamma distribution from a pdf (not from data samples so stats.gamma.fit(data) does not work here).

Are the above steps reasonable for dealing with this kind of problem, and how to do step 4 in Python ? Appreciated for any help.

1

There are 1 best solutions below

0
On

Interesting question. One issue I can see is that it will be sometimes difficult to disambiguate which mode is the Gamma and which is the Gaussian.

What I would perhaps do is try an expectation-maximization (EM) algorithm. Given the ambiguity stated above, I would do a few runs and select the best fit.

Perhaps, to speed things up a little, you could try EM with two possible starting points (somewhat similar to your idea):

  1. estimate a mixture model of two Gaussians, say g0, g1.
  2. run one EM to solve a mixture (Gamma, Gaussian), starting with an initial point that is (gamma_params_approx_gauss(g0), g1), where gamma_params_approx_gauss(g0) is a maximum-likelihood estimator of Gamma parameters given Gaussian parameters (see e.g. here).
  3. run another EM, but starting with (gamma_params_approx_gauss(g1), g0).
  4. select the best fit among the two EM results.