Is there a way to encode an audio Blob from the MediaRecorder API in LINEAR16?

116 Views Asked by Martin At 24 September 2023 at 12:51

I want to send audio directly from the mic of the user to the Google Speech To Text API in realtime. So I would like to use its RecognizeStream feature.

To do this, I use the MediaRecorder API on a webpage, that is connected to my NodeJS server via websockets. Every 2 seconds, I get an audio blob via the ondataavailable event from the MediaRecorderAPI, and I send it to my server via websockets. Then, my server send the blob to the Speech To Text API using the google library with the recognizeStream.write(blob). But right now, the API returns nothing and timeout. I read that it was because of the encoding.

The problem is that I am new to audio encoding and I can't find a way to convert the original blob, to a LINEAR16 blob without using physical files. I'd like to do it directly in the code, because I want to reduce latency as much as possible.

I tried to use the Sox software with the npm package to do the conversion but it can't find a way to use it without physical files.

Here is the frontend code that allows me to send data to the backend.

const mediaConstraints = { audio: true };

mediaStream = await navigator.mediaDevices.getUserMedia(mediaConstraints);
mediaRecorder = new MediaRecorder(mediaStream);
let isRecording = false;

mediaRecorder.ondataavailable = (event) => {
     if (isRecording && event.data.size > 0) {
             socket.emit('audioChunk', event.data);
     }
};

Here is some parts of my NodeJS backend.

const encoding = 'LINEAR16';
const languageCode = 'fr-FR';
const sampleRateHertz = 16000;

const request = {
  config: {
    encoding: encoding,
    languageCode: languageCode,
    sampleRateHertz: sampleRateHertz,
  },
  interimResults: true, // If you want interim results, set this to true
};



io.on('connection', (socket) => {
  console.log('A user connected');

  const recognizeStream = client
  .streamingRecognize(request)
  .on('error', console.error)
  .on('data', data =>
    process.stdout.write(
      data.results[0] && data.results[0].alternatives[0]
        ? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
        : '\n\nReached transcription time limit, press Ctrl+C\n'
    )
  );

  socket.on('audioChunk', (chunk) => { 
    const blob = new Blob([chunk], {type: 'audio/webm;codecs=opus'});
    // CONVERSION NEEDED
    recognizeStream.write(blob.binaryData);
  });

Original Q&A

Is there a way to encode an audio Blob from the MediaRecorder API in LINEAR16?

There are 0 best solutions below

Related Questions in JAVASCRIPT

Related Questions in WEBSOCKET

Related Questions in ENCODING

Related Questions in GOOGLE-SPEECH-TO-TEXT-API

Trending Questions

Popular # Hahtags

Popular Questions