Assume the pdf(probability density function) of my dataset is as below, there are two distributions in the dataset and I want to fit them with a Gamma and a Gaussian distribution.
I have read this but this applies only for two Gaussian distribution. How to use python to separate two gaussian curves?
Here is the steps that I would like to do
- manually estimate the mean and variance of the Gaussian distribution
- base on estimated mean and variance, create the pdf of the Gaussian distribution
- substract the pdf of Gaussian from the original pdf
- fit the pdf to a Gamma distribution
I am able to do 1~3, but for step 4 I do not know how to fit a Gamma distribution from a pdf (not from data samples so stats.gamma.fit(data) does not work here).
Are the above steps reasonable for dealing with this kind of problem, and how to do step 4 in Python ? Appreciated for any help.
Interesting question. One issue I can see is that it will be sometimes difficult to disambiguate which mode is the Gamma and which is the Gaussian.
What I would perhaps do is try an expectation-maximization (EM) algorithm. Given the ambiguity stated above, I would do a few runs and select the best fit.
Perhaps, to speed things up a little, you could try EM with two possible starting points (somewhat similar to your idea):
g0, g1
.(Gamma, Gaussian)
, starting with an initial point that is(gamma_params_approx_gauss(g0), g1)
, wheregamma_params_approx_gauss(g0)
is a maximum-likelihood estimator of Gamma parameters given Gaussian parameters (see e.g. here).(gamma_params_approx_gauss(g1), g0)
.