Does "precision" of audio files have importance during training ASR systems?

124 Views Asked by At

I am resampling audio files with 8 kHz into 16 kHz by torchaudio.

An example of an original file:

Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s

After resampling it's become:

Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 16000 Hz, 1 channels, flt, 512 kb/s

So the precision has been changed to pcm_f32le.

I'd like to know if this is important for training of ASR systems or not.

1

There are 1 best solutions below

0
On

Actually, Kaldi's doc says "Support only KSDATAFORMAT_SUBTYPE_PCM for now." This makes pcm_f32le (which is of KSDATAFORMAT_SUBTYPE_IEEE_FLOAT type) incompatible. So, save only in a PCM format:

torchaudio.save(path, waveform, sample_rate, encoding="PCM_S", bits_per_sample=16)

And if you want to increase audio precision, do so only by increasing bits_pers_sample (in PCM_S encoding).

As for your actual question, it most likely depends on your dataset. So perhaps try both ways and pick the better performing one?