audio to array with torchaudio and librosa are different in python

3.1k Views Asked by At

I loaded mp3 file in python with torchaudio and librosa

import torchaudio
import librosa

filename='example.mp3'
array_tor, sample_rate_tor = torchaudio.load(filename,format='mp3')
array_lib, sample_rate_lib = librosa.load(filename, sr=sample_rate_tor)
print( len(array_tor.numpy()[0]) , len(array_lib)) # get different value

the length of two arrays are different, why makes them different, and how to make them in same?

if I convert example.mp3 to wav file with

from pydub import AudioSegment
audSeg = AudioSegment.from_mp3('example.mp3')
audSeg.export('example.wav', format="wav")

and load wav file with torchaudio , librosa, soundfile

import torchaudio
import librosa
import soundfile as sf
filename='example.wav'
array_tor_w, sample_rate_tor_w = torchaudio.load(filename,format='wav')
array_lib_w, sample_rate_lib_w = librosa.load(filename, sr=sample_rate_tor_w)
array_sfl_w, sample_rate_sfl_w = sf.read(filename)
print( len(array_tor_w.numpy()[0]) , len(array_lib_w), len(array_sfl_w)) # get same value

the three array length and content are same and also same as len(array_lib) in mp3 file.

it seems the torchaudio.load() is special in mp3 file.

1

There are 1 best solutions below

0
On

This is due to the underlying decoder library torchaudio uses.

Up util v0.11, torchaudio used libmad, which does not remove the extra padding when decoding MP3.

See https://github.com/pytorch/audio/issues/1500 for the detail.

In v0.12, torchaudio switched MP3 decoder to FFmpeg, and the padding issue should be resolved.