I am new to machine learning domain. Currently, I am trying to implement a audio language detection system, based on MFCC, delta, delta delta and Mel Spectrum Coefficients of any audio file. These features are extracted using librosa. Librosa returns a 2D matrix of MFCC's. The problem is that I want to train them on a Gaussian Mixture Model. The Sci-kit library takes the input in the format (n_samples, n_features)
, but I have a D matrix of the form (n_samples, n_mfcc, n_time)
as returned by librosa.features.mfcc()
. How can i provide a 3D input to a GMM?
Also is there a way so that I can send all the 4 features mentioned above into the model?