using float32 to transcribe sound by openAI's Whisper ai transcribe() in python

145 Views Asked by At

I am trying to capture audio data from microphone using sounddevice module's rec() function, storing as float32 and feeding it to whisper's. But I don't want to save the audio as a file and recall it.

Is there a way to feed transcribe() function a float32 array

here is my attempt to do so

i tried to convert data using np.array() but failed miserably.

import sounddevice as sd
import numpy as np
import whisper

duration = 15
samplerate = 44100

frames = duration * samplerate

recording = sd.rec(frames, blocking=True, dtype='float32')

model = whisper.load_model("tiny")
rec_array = np.array(recording,dtype=np.float32)
result = model.transcribe(recording,word_timestamps=True, fp16=False)
text = result["text"].strip()
print(text)

the error

RuntimeError: [enforce fail at alloc_cpu.cpp:80] data. DefaultCPUAllocator: not enough memory: you tried to allocate 846721764000 bytes.

that is 847 gigabytes. I truly broke something

edit: full traceback error

    Traceback (most recent call last):
  File "d:\Seshrut\Error-505!!\projects\learn whisper\soundbreak.py", line 34, in <module>
    result = model.transcribe(recording,word_timestamps=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\seshr\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\whisper\transcribe.py", line 133, in transcribe
    mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\seshr\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\whisper\audio.py", line 146, in log_mel_spectrogram
    audio = F.pad(audio, (0, padding))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [enforce fail at alloc_cpu.cpp:80] data. DefaultCPUAllocator: not enough memory: you tried to allocate 846721764000 bytes.

0

There are 0 best solutions below