audio to array with torchaudio and librosa are different in python

3.1k Views Asked by aaaaa At 27 July 2025 at 15:25

I loaded mp3 file in python with torchaudio and librosa

import torchaudio
import librosa

filename='example.mp3'
array_tor, sample_rate_tor = torchaudio.load(filename,format='mp3')
array_lib, sample_rate_lib = librosa.load(filename, sr=sample_rate_tor)
print( len(array_tor.numpy()[0]) , len(array_lib)) # get different value

the length of two arrays are different, why makes them different, and how to make them in same?

if I convert example.mp3 to wav file with

from pydub import AudioSegment
audSeg = AudioSegment.from_mp3('example.mp3')
audSeg.export('example.wav', format="wav")

and load wav file with torchaudio , librosa, soundfile

import torchaudio
import librosa
import soundfile as sf
filename='example.wav'
array_tor_w, sample_rate_tor_w = torchaudio.load(filename,format='wav')
array_lib_w, sample_rate_lib_w = librosa.load(filename, sr=sample_rate_tor_w)
array_sfl_w, sample_rate_sfl_w = sf.read(filename)
print( len(array_tor_w.numpy()[0]) , len(array_lib_w), len(array_sfl_w)) # get same value

the three array length and content are same and also same as len(array_lib) in mp3 file.

it seems the torchaudio.load() is special in mp3 file.

Original Q&A

There are 1 best solutions below

moto On 19 July 2022 at 16:54

This is due to the underlying decoder library torchaudio uses.

Up util v0.11, torchaudio used libmad, which does not remove the extra padding when decoding MP3.

See https://github.com/pytorch/audio/issues/1500 for the detail.

In v0.12, torchaudio switched MP3 decoder to FFmpeg, and the padding issue should be resolved.

audio to array with torchaudio and librosa are different in python

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in MP3

Related Questions in LIBROSA

Related Questions in SOUNDFILE

Related Questions in TORCHAUDIO

Trending Questions

Popular # Hahtags

Popular Questions