I'm using 16000Hz, mono WAV files for my CNN project. Here is the code for MFCC generation
import librosa
import numpy as np
signal, sr = librosa.load('test.wav', sr=None)
mfccs = np.mean(librosa.feature.mfcc(y=signal[0:160], sr=16000, n_fft=160,
n_mfcc=20, n_mels=50).T, axis=0)
The problem is when using this mfcc-based tensorflow model for testing and prediction, all the predictions are almost equal to 0.99. When using a larger input signal, e.g. y=signal[0:1600], the predictions are much better.
How do I generate good quality mfcc for an input signal of 0.01 seconds or signal[0:160]?
Is it possible to write a function that takes three parameters: y, sr, and numpoints and generate the best possible mfcc by calculating the rest of the parameters automatically?
def mfcc(y:array, sr:int, numpoints:int):
# calculate params
return librosa.feature.mfcc(...)