Python SpeechRecognition having trouble processing short pronounced words

32 Views Asked by At

I have a project and I am using the SpeechRecognition module. I noticed that the recognizer have trouble processing input of short pronounced words like "next", "search", "write" and such. When i use these kind of words in a sentence like "Write something....", it doesn't have 'trouble' processing the mic input, i think its because it elongates the input/audio duration to a certain point. What I mean by 'trouble' is, when I say, for example "next", since it is short pronounced, The mic still waits for a while before closing and processing the input. The problem with this is that in my project most of the times, I only need to utter these words by itself, not in a sentence.

What I did to kind of work around this is I elongate pronouncing such words to meet that certain "audio duration". For example I pronounce the word 'next' to "neeeeeeeext" or the word 'search' "seeeeeaaarch". I don't want that because I sound stupid saying the commands that way. Any suggestions on how can I fix this? or am I just missing something? Thank you in advanced!

P/S. I use a separate recognizer/speechrecognizer tool, not the recognizers in the SpeechRecognizer module. I mainly use the SpeechRecognizer module for its mic and its features. I also don't use PyAudio because of latency issues.

1

There are 1 best solutions below

1
Noether On

I cannot provide an exact answer to this question, but I recommend you use a HuggingFace model. You can try, for example, whisper small. In theory, the model is quite small and should run in your CPU without much delay.

Here is an example of how you would need to call it:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="openai/whisper-small")