80 second delay using Google Cloud SpeechRecognition with python 3.9 on RPi3B+

174 Views Asked by At

I'm using the PyPi code ( https://pypi.org/project/SpeechRecognition/)

  • cleaned up to use only Google Cloud SpeechRecognition.
  • Google Json Credentials in shell's environment, and working.

I've enabled the Cloud Speech-to-Text API, got the Json credentials, and the service calls ARE hitting the API. The Microphone is fine, and the recording bit happens quickly.

However, its taking fully 80 seconds to perform the API call!

I've monitored the network traffic, and I can see that the API connection kinda sits idle for 78 seconds, and then TX/RXs really fast in the final 2 seconds. How can I speed this up?

Can it be slow-authentication handshake that I might mend?

MORE INFORMATION: My application performs 3 API calls: Google Speech-to-text | Google Translate text-to-text | Google text-to-Speech. Those API Calls ALWAYS take 80 seconds , 20s & 80s respectively.

Thanks a mil!

The delay happens on the 2nd last line here:

print("0 seconds")
try:
    print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio)
print("80th second")
2

There are 2 best solutions below

1
usermajuser On BEST ANSWER

Problem solved! It was the SSL certs; bouquets to @Dean Van Greune & @VonC My fibre router (Sagemcom) was blocking the Pi's SSL certs, or forcing it to a different port, creating massive delays. I remember solving the same problem for JavaMail TLS a while back, and wanting to take the bat to the router ("Office Space"-style). Swapping to the hotspot on my phone, all this is working faster than the lightiest lightning now. Thanks for your help & suggestions guys!!!

3
VonC On

It was the SSL certs; My fibre router (Sagemcom) was blocking the Pi's SSL certs, or forcing it to a different port, creating massive delays

From the troubleshooting steps below, you would find some related to network configuration, SSL/TLS handshake, and detailed network analysis, which should have helped:

  • openssl to test the SSL connection time to Google's API
  • Wireshark, for network-related issues
  • changing network environment (which helps to isolate the problem)

Troubleshooting steps

I would assume the Raspberry Pi has a stable network connection. Still, check that DNS settings are optimal for quick resolutions.

# Check current DNS configuration
cat /etc/resolv.conf

# Change DNS server if necessary (e.g., to Google's 8.8.8.8)
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf

You can also measure the SSL connection time to Google's API (openssl s_client).

# Check SSL handshake time
openssl s_client -connect speech.googleapis.com:443

Try and compress the audio file if it is large, which might reduce upload time.

# Example: Compressing audio file before upload (pseudo-code)
import audio_compression_library
compressed_audio = audio_compression_library.compress(audio)

And switch to an asynchronous recognition approach, if not already using it.

# Example: Asynchronous call to Google Speech-to-Text API (pseudo-code)
recognize_future = r.recognize_google_cloud_async(audio)
result = recognize_future.result()

Confirm that the authentication method is efficient and does not cause delays.

Profile the Python script to identify any bottlenecks.

# Install profiling tool
pip install line_profiler

# Run the profiler
kernprof -l -v your_script.py

I am taking the audio directly from the mic using the PyPi SpeechRecognition code - its usually 2 or 3 seconds long. I also tried passing the Google API 16,000hz WAV files (Linear16) of ~170KB size, using the Google Library example. Both Work. Both take exatcly 80 seconds. Tested using WiFi at ~max distance + using wired Ethernet to Fibre router: 80s each - EVERY time!

That would mean the issue is not related to the audio file format or network speed, but rather something more intrinsic to the interaction between your Raspberry Pi setup and the Google Cloud Speech-to-Text API.

Make sure the audio parameters are consistently set to the optimal values for Google's API (e.g., 16kHz sampling rate, Linear16 format).
You could add a step in your Python script to always convert the audio to this format before sending it to the API.

# Example: Converting audio to 16kHz Linear16 format (pseudo-code)
import audio_format_converter
formatted_audio = audio_format_converter.convert_to_linear16(audio, sample_rate=16000)

Since the problem persists across different networks, the issue might be with how the API request is constructed or handled. Try reducing the complexity of the request or breaking it down into smaller parts.

There might be specific configurations or limitations within the Raspberry Pi causing the delay. Check for any background processes or resource limitations that could be impacting the performance.


My application performs 3 API calls: Google Speech-to-text | Google Translate text-to-text | Google text-to-Speech. Those API Calls ALWAYS take 80 seconds , 20s & 80s respectively.

The consistent and specific durations of the delays in your application's API calls to Google services (80 seconds for Speech-to-Text, 20 seconds for Translate, and 80 seconds for Text-to-Speech) suggest a pattern that might be rooted in how these calls are handled either on the Raspberry Pi or within the network infrastructure.
Use monitoring tools to observe CPU usage, memory, and network activity on the Raspberry Pi during the API calls. That might reveal resource bottlenecks.

Regarding Google Speech-to-Text (80 seconds delay), as mentioned before, implement asynchronous calls to see if it reduces the wait time. And consider breaking the audio data into smaller segments before sending it to the API, as it might be easier to process smaller chunks more quickly.
Double-check that the audio format and sampling rate are optimized for Google's API.

For Google Translate (20 seconds delay), check if the size of the text being translated affects the response time. Try with shorter and longer texts to see if there is a pattern. Since this service seems to have a shorter delay, it might be more related to network latency. A detailed network analysis might reveal more insights.

And for Google Text-to-Speech (80 seconds delay), test with varying complexities of text to see if it affects the processing time. Similar to Speech-to-Text, make sure the request is optimally formatted and does not contain unnecessary data or headers.

For all three services, the fact that both Speech-to-Text and Text-to-Speech take exactly 80 seconds may indicate a common bottleneck, possibly in the way audio data is handled. Determine how much time is spent processing data locally on the Raspberry Pi versus the time taken in cloud processing and network transmission. That could be done by timing the different stages of your application's workflow.
And verify that your API key is correctly configured and that you are not hitting any usage quotas or limits that might throttle the speed of your requests.

If possible, run the same application on a different platform or device. If the delays significantly differ, it might indicate an issue specific to the Raspberry Pi's hardware or configuration.