I am trying to transcribe an OGG OPUS Base64 encoded audio string using Google Cloud Speech-to-Text API in Node.js. The audio has a sample rate of 48000 hertz. When I run my code, the API returns an empty transcription. This only happens sometimes. Other times, it transcribes the audio just fine. I will return to the project later and find that the error returns randomly. When I convert the Base64 to a Buffer and save the file, the audio plays fine in VLC player, and ffprobe shows the correct information for the resulting file.
I have already tried checking the audio quality, encoding, sample rate, etc., but none of these solutions helped. Here's my code:
import { SpeechClient } from "@google-cloud/speech";
// `base64Audio` looks like this:
// "data:audio/ogg; codecs=opus;base64,T2dnUwACAAAAAAAA..."
export async function transcribeB64(base64Audio: string): Promise<string> {
const client = new SpeechClient();
return new Promise(async (resolve) => {
const content = base64Audio.split(",")[1];
const x = await client.recognize({
config: {
encoding: "OGG_OPUS",
sampleRateHertz: 48000,
languageCode: "en-US",
},
audio: {
content,
},
});
resolve(JSON.stringify(x, null, 2));
});
}
The API response looks like this:
[
{
"results": [],
"totalBilledTime": {
"seconds": "0",
"nanos": 0
},
"speechAdaptationInfo": null,
"requestId": "000000"
},
null,
null
]
And this is the ffprobe output:
Input #0, ogg, from 'input.ogg':
Duration: 00:00:05.74, start: 0.000000, bitrate: 129 kb/s
Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
Metadata:
ENCODER : Mozilla111.0.1
Why is my audio not being transcribed?
I was not able to isolate a root cause, but it appears that changing the codec from "OGG_OPUS" to "WEBM_OPUS" fixed the problem so far. I would love to hear possible explanations of why this is happening but I have none at the moment.