Python speech to text from mikrofon stream

124 Views Asked by At

I want to programm my own speech assistent with python and run it on a rapsberry pi later on. My first step to do this is to transcribe the speech of a microfon stream. So I want the speech that my microphone receives to be immediately converted to text so that I can then check this text for signal words such as "Hey Siri".

I have already tried most of the STT APIs, such as speech recognition, whisper and Google Cloud Speech_To_Text. I had the problem with all of them that they weren't transcribing during the stream. For example speech recognition waited until I stopped speaking. This recorded audio file was then sent to the servers and transcribed. This took a very long time.

Any ideas?

1

There are 1 best solutions below

0
On

The specific problem you are trying to solve here is the real-time transcription of streaming audio. The SpeechRecognition library for Python is capable of doing this, but requires some additional manipulation. See this question for more information.