How do I load a bytes object WAV audio file in torchaudio?

4.9k Views Asked by At

I am trying to load a bytes-class object named "audio" to be loaded as a torchaudio object:

def convert_audio(audio, target_sr: int = 16000): 


    wav, sr = torchaudio.load(audio) 

    #(...) some other code

I cannot find any documentation online with instructions on how to load a bytes audio object inside Torchaudio, it seems to only accept path strings. But I have to save I/O in my application and I cannot write and load .wav files, only handle the audio objects directly.

Does anyone have a suggestion in this case?

If I use audio directly, I get this error:

Exception has occurred: AttributeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
'bytes' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.
  File "/home/felipe/.local/lib/python3.10/site-packages/torch/serialization.py", line 348, in _check_seekable
    f.seek(f.tell())

With BytesIO:

Exception has occurred: UnpicklingError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
invalid load key, '\x00'.
  File "/home/felipe/.local/lib/python3.10/site-packages/torch/serialization.py", line 1002, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
  File "/home/felipe/.local/lib/python3.10/site-packages/torch/serialization.py", line 795, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/felipe/Coding projects/silero/stt.py", line 35, in convert_audio
    wav,sr = torch.load(io.BytesIO(audio))
  File "/home/felipe/Coding projects/silero/stt.py", line 60, in transcribe
    input = prepare_model_input(convert_audio(audio),
  File "/home/felipe/Coding projects/silero/psgui.py", line 97, in <module>
    transcripton = stt.transcribe('en',audio)
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
2

There are 2 best solutions below

2
On

If it's WAV format, torchaudio.load should be able to decode it from file-like object. Your code snippet looks good to me.

The following tutorial demonstrates it with different file-like objects.

https://pytorch.org/audio/0.13.0/tutorials/audio_io_tutorial.html#loading-from-file-like-object

Still, there are many reasons it does not work. For example, is your file-like object's cursor pointing the correct position (the beginning of the audio data)? Does the read method conformant to the io.RawIOBase.read protocol?

It's hard to tell without seeing the error stacktrace.

3
On

You need to change it to file-like object first

result = b'xxxxx'

# change result bytes stream to file-like object
wav_file_bytesIO = BytesIO(result)

data, sr = torchaudio.load(wav_file_bytesIO)