difference in speed between tensorflow implementations of mfcc spectrogram

133 Views Asked by At

I am trying to preprocess audio clips for a keyword spotting task that uses machine learning models.

The first step is to calculate the spectrogram starting from the waveform and in order to do so I have found that there are two ways within the tensorflow framework.

The first one is to use the tf.signal library.

This means the functions:

stft = tf.signal.stft(signals, frame_length, frame_step)
spectrogram = tf.abs(stft)
# matrix computed beforehand
tf.tensordot(spectrogram, linear_to_mel_weight_matrix, 1)
log_mel_spectrogram = tf.math.log(mel_spectrogram + 1.e-6)
mfccs = tf.signal.mfccs_from_log_mel_spectrograms(log_mel_spectrogram)

The second is to use tf.raw_ops library. This results in the following code:

# spectrogram computation
spectrogram = tf.raw_ops.AudioSpectrogram(
    input=sample,
    window_size=window_size_samples,
    stride=window_stride_samples
    )

# mfcc computation
mfcc_features = tf.raw_ops.Mfcc(
    spectrogram=spectrogram,
    sample_rate=sample_rate,
    dct_coefficient_count=dct_coefficient_count
)

The problem is that the second one is much faster (~10x). As you can see from this table.

Operation tf.signal tf.raw_ops
STFT 5.09ms 0.47ms
Mel+MFCC 3.05ms 0.25ms

In both cases the same parameters were used (window size, hop size, number of coefficients...). I have done some tests and the output is the same up to the 3rd decimal digit.

My question is: does someone have some experience with these functions or is someone able to explain this behavior?

0

There are 0 best solutions below