Differences in MFCC values

70 Views Asked by At

I wrote my own code to extract MFCC, the algorithm is as follows:

  1. data from file.wav
  2. pre-emphasis according to the expression x'[n] = x[n] - 0,97 * x[n-1]
  3. framing signal, frame length 480, overlap 240
  4. STFT with hamming window 480 long and 512-point FFT, that is 480 values and 32 zeros. w[n] = 0,54 - 0,46 * cos(2pin/(N-1))'
  5. creating a filter bank using the formula formula
  6. calculate the energy E[k] = |X[k]|^2 or periodogram of the signal P[k] = (|X[k]|^2)/N
  7. apply the filter bank to the periodogram or energy of the signal and log the sums S[m] = ln{H[k] * E[k]}
    S[m] = ln{H[k] * P[k]}
  8. apply DCT c[n] = S[m] * cos(pin(m+(1/2))/M)

Comparing the MFCC values obtained using this method with the values calculated using matlab functions, I get approximately the same picture, but apparently matlab function has a different kind of normalization and the filter bank is calculated in a different way, my values are approximately an order of magnitude greater than the values obtained in matlab.

Also, my implementation gives more frames, i.e. a sound signal with 6400 samples has 26 = 6400/480-240 frames in my case and 25 in matlab.

The calculations before applying the filter bank are roughly similar, except that in my case the FFT is calculated with a length of 512 points, while the stft() function in matlab gives 241 values, i.e. a 480-point conversion. The filter banks also differ by about an order of magnitude in values, but if use a filter bank with values divided by 10, the picture is not similar of matlab's

filter banks filter banks

Accordingly, the mel-spectrum (7 step) image shows that the first peak in my values (on the right) is not displayed, and the MFCC image shows that the values of the coefficients differ by about an order of magnitude.

mel-spectrum mel-spectrum

MFCC MFCC

Which could be the reason for these values?

Thank you for your time

0

There are 0 best solutions below