Is there a way to encode an audio Blob from the MediaRecorder API in LINEAR16?

116 Views Asked by At

I want to send audio directly from the mic of the user to the Google Speech To Text API in realtime. So I would like to use its RecognizeStream feature.

To do this, I use the MediaRecorder API on a webpage, that is connected to my NodeJS server via websockets. Every 2 seconds, I get an audio blob via the ondataavailable event from the MediaRecorderAPI, and I send it to my server via websockets. Then, my server send the blob to the Speech To Text API using the google library with the recognizeStream.write(blob). But right now, the API returns nothing and timeout. I read that it was because of the encoding.

The problem is that I am new to audio encoding and I can't find a way to convert the original blob, to a LINEAR16 blob without using physical files. I'd like to do it directly in the code, because I want to reduce latency as much as possible.

I tried to use the Sox software with the npm package to do the conversion but it can't find a way to use it without physical files.

Here is the frontend code that allows me to send data to the backend.

const mediaConstraints = { audio: true };

mediaStream = await navigator.mediaDevices.getUserMedia(mediaConstraints);
mediaRecorder = new MediaRecorder(mediaStream);
let isRecording = false;

mediaRecorder.ondataavailable = (event) => {
     if (isRecording && event.data.size > 0) {
             socket.emit('audioChunk', event.data);
     }
};

Here is some parts of my NodeJS backend.

const encoding = 'LINEAR16';
const languageCode = 'fr-FR';
const sampleRateHertz = 16000;

const request = {
  config: {
    encoding: encoding,
    languageCode: languageCode,
    sampleRateHertz: sampleRateHertz,
  },
  interimResults: true, // If you want interim results, set this to true
};



io.on('connection', (socket) => {
  console.log('A user connected');

  const recognizeStream = client
  .streamingRecognize(request)
  .on('error', console.error)
  .on('data', data =>
    process.stdout.write(
      data.results[0] && data.results[0].alternatives[0]
        ? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
        : '\n\nReached transcription time limit, press Ctrl+C\n'
    )
  );

  socket.on('audioChunk', (chunk) => { 
    const blob = new Blob([chunk], {type: 'audio/webm;codecs=opus'});
    // CONVERSION NEEDED
    recognizeStream.write(blob.binaryData);
  });
0

There are 0 best solutions below