Trouble Capturing and Playing Tab Audio in Chrome Extension with Transcription

68 Views Asked by At

I'm developing a Chrome extension where I want to capture audio & microphone input from a browser tab, send this stream to a transcription service (Azure in my case), and simultaneously play it through the system's speakers. However, I'm facing issues with playing the tab audio through the speakers, and enabling audio playback interferes with the transcription service, possibly creating a feedback loop.

What I'm Trying to Achieve:

  1. Capture audio & mic input from a Chrome tab.
  2. Use this audio for transcription via Azure's Speech SDK.
  3. Play the captured tab audio through the system's speakers.

**Transcription is working perfectly fine for both audio and input given from microphone.

The Problem:

  • When I try to play the captured audio (tabAudioMediaStream), there's no sound output.
  • Enabling the sound output seems to affect the transcription quality, likely due to a feedback loop. Transcription service starts sending unrecognized text in response.

Code Snippet:

Here's the relevant part of my code:

To get the Media StreamId in background.js

chrome.tabs.query({active: true, currentWindow: true}, (tabs) => {
    if (tabs.length > 0) {
      chrome.tabCapture.getMediaStreamId({
        targetTabId: tabs[0].id,
        consumerTabId: tabs[0].id
      }).then(tabStreamId => {
        console.log("StreamId: ", tabStreamId);
        chrome.tabs.sendMessage(tabs[0].id, {type: "tabStreamId", tabStreamId});
      })

    }
  });

MediaStreamId is then passed to content script which then capture the microphone stream and then passes to azure transcription service. Here's the relevant code.


const constraints = {
    audio: { echoCancellation: true },
};

    navigator.mediaDevices.getUserMedia({
        audio: {
            mandatory: {
                chromeMediaSource: "tab",
                chromeMediaSourceId: tabStreamId,
            },
        },
        video: false
    }).then(tabAudioMediaStream => {

        // continueToPlayCapturedAudio(tabAudioMediaStream.clone());
        if (tabAudioMediaStream.getAudioTracks().length === 0) {
            console.error("No audio tracks in tab stream");
            return;
        }
         // Attempt to play audio directly - this did not produce sound
        // let audio = new Audio();
        // audio.srcObject = tabAudioMediaStream;
        // audio.play();

        let tabAudioMediaSourceNode = audioContext.createMediaStreamSource(tabAudioMediaStream);
        // I expected this connection to play the audio through the speakers, but it's not working instead affecting transcription service response
        // tabAudioMediaSourceNode.connect(audioContext.destination);

        navigator.mediaDevices.getUserMedia(constraints)
            .then(micAudioMediaStream => {
                // Handle microphone audio stream
                let micAudioMediaSourceNode = audioContext.createMediaStreamSource(micAudioMediaStream);
                
                return micAudioMediaSourceNode
            }).then(micAudioMediaSourceNode => {

                tabAudioMediaSourceNode.connect(destination);
                micAudioMediaSourceNode.connect(destination);

                output = new MediaStream();
                output.addTrack(destination.stream.getAudioTracks()[0]);
                // start the transcription service using the stream
                startAzureTranscriptionService(output);
            })
    })

continueToPlayCapturedAudio()


manifest.js permission. I am using MV3


  "permissions": [
    "storage",
    "identity",
    "tabs",
    "tabCapture",
    "activeTab"
  ],

I am seeking guidance on the following:

  1. Why might the audio not be playing through the speakers in both these approaches?
  2. Are there additional considerations or configurations in the Web Audio API or Chrome extension environment that I might be missing? Any insights or suggestions to resolve this issue would be greatly appreciated.

Expecting the audio to be played through the speakers. Tried couple of approaches but none of them worked instead started affecting response of transcription service.

Issue:

  • No sound is produced when trying to play the tab audio through the speakers.
  • Attempts to play the audio interfere with the transcription service, leading to poor transcription quality or unrecognizable input.
0

There are 0 best solutions below