I loaded mp3 file in python with torchaudio
and librosa
import torchaudio
import librosa
filename='example.mp3'
array_tor, sample_rate_tor = torchaudio.load(filename,format='mp3')
array_lib, sample_rate_lib = librosa.load(filename, sr=sample_rate_tor)
print( len(array_tor.numpy()[0]) , len(array_lib)) # get different value
the length of two arrays are different, why makes them different, and how to make them in same?
if I convert example.mp3 to wav file with
from pydub import AudioSegment
audSeg = AudioSegment.from_mp3('example.mp3')
audSeg.export('example.wav', format="wav")
and load wav file with torchaudio
, librosa
, soundfile
import torchaudio
import librosa
import soundfile as sf
filename='example.wav'
array_tor_w, sample_rate_tor_w = torchaudio.load(filename,format='wav')
array_lib_w, sample_rate_lib_w = librosa.load(filename, sr=sample_rate_tor_w)
array_sfl_w, sample_rate_sfl_w = sf.read(filename)
print( len(array_tor_w.numpy()[0]) , len(array_lib_w), len(array_sfl_w)) # get same value
the three array length and content are same and also same as len(array_lib)
in mp3 file.
it seems the torchaudio.load()
is special in mp3 file.
This is due to the underlying decoder library torchaudio uses.
Up util v0.11, torchaudio used libmad, which does not remove the extra padding when decoding MP3.
See https://github.com/pytorch/audio/issues/1500 for the detail.
In v0.12, torchaudio switched MP3 decoder to FFmpeg, and the padding issue should be resolved.