Alright, so I'm working on a class project and I'm trying to send a recording made using javascript's navigator.mediaDevices.getUserMedia and MediaRecorder classes to the backend of my web application (written in Python, Flask) and to the Google Speech to Text API (google-cloud-speech)
So far, I've gotten to the point of making a recording, but I can't seem to get it to the Google Cloud API successfully. Here's how I'm trying to do it:
- Use navigator.mediaDevices.getUserMedia to recognize the user's microphone
- Use the resulting audio stream to make a MediaRecorder object
- Use that recorder object to make a blob with the resulting audio (with {'type' : 'audio/flac'})
- Base64Encode it and write it to a hidden form element, and submit the corresponding form
- From there, the resulting POST request goes to my Python Flask backend, where it reads in the Base64 encoded string as a... string
- Attempt to use the google-cloud-speech client to decode the text
It's not working. I'm using the Python library, and I can't seem to send the base64 string directly (because the Python library wants bytes instead). I've tried base64decoding the string back into bytes, but when I ran it through the API, I always seem to get empty ([]) results. After looking this up briefly, it seems that sample rate could be a problem. I've attempted to set the sample rate on the navigator.mediaDevices.getUserMedia() object to 16000--the constructor looks like this:
navigator.mediaDevices.getUserMedia({ audio: true, sampleRate: 16000 })
and the config part of my client.recognize() call (in my Python backend) looks like this:
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=int(sampleRate),
language_code="en-US",
)
Does anyone have any idea what's the issues(s) are here?