Send live recording from HTML frontend to Google Cloud Speech via Flask backend

244 Views Asked by At

Alright, so I'm working on a class project and I'm trying to send a recording made using javascript's navigator.mediaDevices.getUserMedia and MediaRecorder classes to the backend of my web application (written in Python, Flask) and to the Google Speech to Text API (google-cloud-speech)

So far, I've gotten to the point of making a recording, but I can't seem to get it to the Google Cloud API successfully. Here's how I'm trying to do it:

  1. Use navigator.mediaDevices.getUserMedia to recognize the user's microphone
  2. Use the resulting audio stream to make a MediaRecorder object
  3. Use that recorder object to make a blob with the resulting audio (with {'type' : 'audio/flac'})
  4. Base64Encode it and write it to a hidden form element, and submit the corresponding form
  5. From there, the resulting POST request goes to my Python Flask backend, where it reads in the Base64 encoded string as a... string
  6. Attempt to use the google-cloud-speech client to decode the text

It's not working. I'm using the Python library, and I can't seem to send the base64 string directly (because the Python library wants bytes instead). I've tried base64decoding the string back into bytes, but when I ran it through the API, I always seem to get empty ([]) results. After looking this up briefly, it seems that sample rate could be a problem. I've attempted to set the sample rate on the navigator.mediaDevices.getUserMedia() object to 16000--the constructor looks like this:

navigator.mediaDevices.getUserMedia({ audio: true, sampleRate: 16000 })

and the config part of my client.recognize() call (in my Python backend) looks like this:

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
    sample_rate_hertz=int(sampleRate),
    language_code="en-US",
)

Does anyone have any idea what's the issues(s) are here?

0

There are 0 best solutions below