I am resampling audio files with 8 kHz into 16 kHz by torchaudio.
An example of an original file:
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s
After resampling it's become:
Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 16000 Hz, 1 channels, flt, 512 kb/s
So the precision has been changed to pcm_f32le.
I'd like to know if this is important for training of ASR systems or not.
Actually, Kaldi's doc says "Support only KSDATAFORMAT_SUBTYPE_PCM for now." This makes
pcm_f32le
(which is ofKSDATAFORMAT_SUBTYPE_IEEE_FLOAT
type) incompatible. So, save only in a PCM format:And if you want to increase audio precision, do so only by increasing
bits_pers_sample
(inPCM_S
encoding).As for your actual question, it most likely depends on your dataset. So perhaps try both ways and pick the better performing one?