SciKit Gaussian Mixture Model ValueError: x and y must have same first dimension

155 Views Asked by At

i'm trying to follow this tutorial for GMM with Python SciKit. THe problem is that the original code does not work out of the box. It says there are problems with the shape of the input arrays and that GMM is now depreacated. I've tried to rewrite it as:

np.random.seed(2)
x = np.concatenate([np.random.normal(0, 2, 200),
                    np.random.normal(5, 5, 200),
                    np.random.normal(3, 0.5, 600)])
x = np.reshape(x, (-1, 1))

plt.hist(x, 80, normed=True)
plt.xlim(-10, 20)
clf = GaussianMixture(4, max_iter=500, random_state=3).fit(x)
xpdf = np.linspace(-10, 20, 1000)
xpdf = np.reshape(xpdf, (-1, 1))
density = np.exp(clf.score(xpdf))

plt.hist(x, 80, normed=True, alpha=0.5)
plt.plot(xpdf, density, '-r')
plt.xlim(-10, 20)

But still i get a ValueError: x and y must have same first dimension. As far as i can understand now the problem has been moved from the shape of arrays to the shape of density variable. But i'm not sure what is actually going on. Could anyone please shed some light on this? Thanks.

1

There are 1 best solutions below

0
On BEST ANSWER

If you check the shape of density the problem will be much clearer:

>>> density.shape
()

The score method returns the log-likelihood of the entire dataset that it is passed, which is just a single scalar value. You want score_samples, which will provide the log likelihood of each individual point.

It's possible the API may have changed here since the tutorial was written -- I'm not sure.