I have been using Google Speech Recognition for Python. Here is my code:
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
print(r.recognize_google(audio))
Although the recognition is very accurate, it takes about 4-5 seconds before it spits out the recognized text. Since I am creating a voice assistant, I want to modify the above code to allow speech recognition to be much faster.
Is there any way we can lower this number to about 1-2 seconds? If possible, I am trying to make recognition as fast as services such as Siri and Ok Google.
I am very new to python, so my apologies if there is a simple answer to my question.
You could use another speech recognition program. For example, you could set up an account with IBM to use their Watson Speech To Text. If possible, try and use their websocket interface, because then it actively transcribes what you are saying while you are still speaking.
An example (not using websockets) would be:
You could also attempt using pocketsphinx, but personally, I have not had particularly good experiences with it. It is offline (a plus) but, for me, wasn't particularly accurate. You could probably tweak with some detection settings and cancel out some background noise. I believe there is also a training option to get it modified to your voice, but it doesn't look straightforward.
Some useful links:
Speech recognition
Microphone recognition example
IBM Watson Speech to Text
Good luck. Once speech recognition works correctly, it is very useful and rewarding!