Make Python Speech Recognition Faster

16.7k Views Asked by At

I have been using Google Speech Recognition for Python. Here is my code:

import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
   print("Say something!")
   audio = r.listen(source)
   print(r.recognize_google(audio))

Although the recognition is very accurate, it takes about 4-5 seconds before it spits out the recognized text. Since I am creating a voice assistant, I want to modify the above code to allow speech recognition to be much faster.

Is there any way we can lower this number to about 1-2 seconds? If possible, I am trying to make recognition as fast as services such as Siri and Ok Google.

I am very new to python, so my apologies if there is a simple answer to my question.

2

There are 2 best solutions below

1
On

You could use another speech recognition program. For example, you could set up an account with IBM to use their Watson Speech To Text. If possible, try and use their websocket interface, because then it actively transcribes what you are saying while you are still speaking.

An example (not using websockets) would be:

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Adjusting for background noise. One second")
    r.adjust_for_ambient_noise(source)
    print("Say something!")
    audio = r.listen(source)

IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE"  # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE"  # IBM Speech to Text passwords are mixed-case alphanumeric strings
try:
    print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD))
except sr.UnknownValueError:
    print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
    print("Could not request results from IBM Speech to Text service; {0}".format(e))

You could also attempt using pocketsphinx, but personally, I have not had particularly good experiences with it. It is offline (a plus) but, for me, wasn't particularly accurate. You could probably tweak with some detection settings and cancel out some background noise. I believe there is also a training option to get it modified to your voice, but it doesn't look straightforward.

Some useful links:

Speech recognition

Microphone recognition example

IBM Watson Speech to Text

Good luck. Once speech recognition works correctly, it is very useful and rewarding!

1
On

Use proper input channel and adjustment for best results:

def speech_to_text():

    required=-1
    for index, name in enumerate(sr.Microphone.list_microphone_names()):
        if "pulse" in name:
            required= index
    r = sr.Recognizer()
    with sr.Microphone(device_index=required) as source:
        r.adjust_for_ambient_noise(source)
        print("Say something!")
        audio = r.listen(source, phrase_time_limit=4)
    try:
        input = r.recognize_google(audio)
        print("You said: " + input)
        return str(input)
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))