To support decoding 'mp3' audio files, please install 'sox'

994 Views Asked by At

I'm trying to work on an ASR model using transfer learning on wav2vec 2 model. Anyway when I ever I wan't to show or modifiy an audio file I get this problem

def prepare_dataset(batch):
    audio = batch["audio"]

    # batched output is "un-batched"
    batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
    batch["input_length"] = len(batch["input_values"])
    
    with processor.as_target_processor():
        batch["labels"] = processor(batch["sentence"]).input_ids
    return batch
common_voice_train = common_voice_train.map(prepare_dataset, remove_columns=common_voice_train.column_names)
common_voice_test = common_voice_test.map(prepare_dataset, remove_columns=common_voice_test.column_names)

The erorrs:

RuntimeError: Backend "sox_io" is not one of available backends: ['soundfile']. ImportError: To support decoding 'mp3' audio files, please install 'sox'.

This is my pytorch and torchaudio versions:

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)
1.13.1+cu117
0.13.1+cu117

I really need help fixing this problem, this is part of my junior project! )':

I've trying to installing pytorch and installing deffrent versions but nothing worked the code is working. fine in colab but it's impossible for me to train it there so I have to use visual code...

2

There are 2 best solutions below

1
moto On BEST ANSWER

TorchAudio v2.1- (Added on 2023 September)

In TorchAudio v2.1, the sox binding is switched to dynamic. Meaning that users need to install libsox separately somehow, and one way is pip install sox.

Before TorchAudio v2.1 (the original answer)

First, note that the second error message is not from torchaudio and it's not accurate. TorchAudio does not depend on an external sox package.

TorchAudio provides limited IO features on Windows, as libsox does not compile on Windows with VS2019. This situation is being worked on, but as of v0.13, Windows users need a workaround.

A simple way is to use other libraries like soundfile and convert the decoded NumPy NdArray object into PyTorch Tensor.

Another way is to install FFmpeg, and use torchaudio.io.StreamReader. You can write your own load function, following the tutorial like this.

https://pytorch.org/audio/0.13.1/tutorials/streamreader_basic_tutorial.html#sphx-glr-tutorials-streamreader-basic-tutorial-py

0
Ladislav Vašina On

To fix your issue do following:

Install normal sox library

pip install sox

or (change dnf if you are not on fedora/rhel to apt for example on ubuntu)

sudo dnf install sox

If this does not help, it is possible you still need to install the development binary of SoX.

sudo dnf install sox-devel

For TorchAudio to work it needs to find libsox.so in your libraries (TorchAudio >= 2.1.0).

You can check where the libsox.so is using

find / -name libsox.so 2>/dev/null