I am trying to decompose a spectrum of a mixture of compounds into its main components ("pure compounds") by means of non-negative matrix factorization.
mixed_spectrum is a np array expressing the spectrum of my compounds mixture,
and W_init contains column vectors with spectra of known pure compounds and an extra component for unknown substances.
My script for the analysis is the following:
import numpy as np
from sklearn.decomposition import NMF
pure_compounds /= np.max(pure_compounds)
n_components = len(pure_compounds) + 1
model = NMF(n_components=n_components, init='custom')
W_init = np.concatenate((pure_compounds, np.ones((pure_compounds.shape[0], 1))), axis=1)
print(W_init)
print(np.shape(W_init))
mix = mixed_spectrum.reshape(-1, 1)
print(mix)
print(np.shape(mix))
W = model.fit_transform(mix, W=W_init)
H = model.components_
for i in range(n_components):
coeff = W[:, i] / np.sum(W, axis=1)
print('Concentration coefficients for compound', i, ':', coeff)
which gives me the output
[[0.00000000e+00 2.76922293e-04 1.44273203e-03 0.00000000e+00
1.00000000e+00]
[0.00000000e+00 2.85985939e-04 1.40933148e-03 0.00000000e+00
1.00000000e+00]
[0.00000000e+00 2.54579755e-04 1.52763958e-03 0.00000000e+00
1.00000000e+00]
...
[1.00000000e-06 1.00000000e-06 1.00000000e-06 1.00000000e-06
1.00000000e+00]
[1.00000000e-06 1.00000000e-06 1.00000000e-06 1.00000000e-06
1.00000000e+00]
[1.00000000e-06 1.00000000e-06 1.00000000e-06 1.00000000e-06
1.00000000e+00]]
(8954, 5)
[[2.31330983e-02]
[1.00000000e-06]
[1.61938395e-02]
...
[2.42525353e-02]
[1.95440454e-02]
[2.98149646e-03]]
(8954, 1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_6528\919808671.py in <module>
8 print(mix)
9 print(np.shape(mix))
---> 10 W = model.fit_transform(mix, W=W_init)
11 H = model.components_
12 for i in range(n_components):
~\Anaconda3\lib\site-packages\sklearn\decomposition\_nmf.py in fit_transform(self, X, y, W, H)
1536
1537 with config_context(assume_finite=True):
-> 1538 W, H, n_iter = self._fit_transform(X, W=W, H=H)
1539
1540 self.reconstruction_err_ = _beta_divergence(
~\Anaconda3\lib\site-packages\sklearn\decomposition\_nmf.py in _fit_transform(self, X, y, W, H, update_H)
1595
1596 # initialize or check W and H
-> 1597 W, H = self._check_w_h(X, W, H, update_H)
1598
1599 # scale the regularization terms
~\Anaconda3\lib\site-packages\sklearn\decomposition\_nmf.py in _check_w_h(self, X, W, H, update_H)
1460 n_samples, n_features = X.shape
1461 if self.init == "custom" and update_H:
-> 1462 _check_init(H, (self._n_components, n_features), "NMF (input H)")
1463 _check_init(W, (n_samples, self._n_components), "NMF (input W)")
1464 if H.dtype != X.dtype or W.dtype != X.dtype:
~\Anaconda3\lib\site-packages\sklearn\decomposition\_nmf.py in _check_init(A, shape, whom)
52
53 def _check_init(A, shape, whom):
---> 54 A = check_array(A)
55 if np.shape(A) != shape:
56 raise ValueError(
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
759 # If input is scalar raise error
760 if array.ndim == 0:
--> 761 raise ValueError(
762 "Expected 2D array, got scalar array instead:\narray={}.\n"
763 "Reshape your data either using array.reshape(-1, 1) if "
ValueError: Expected 2D array, got scalar array instead:
array=None.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
How can I fix the issue? I tried to play with the shape of the mix and W_init variables, but without success.
I solved the issue. When choosing a custom initialization for an NMF decomposition, one has to define both the initial W and H matrices.
Furthermore, one has to make sure that all arrays X, W and H are C-contiguous. If a numpy array has to be transposed, it is necessary to generate a
.copy()to ensure this contiguousness.My corrected code: