Can't stream azure tts from server to client side using a pushstream; response on client side is size 0 and the server is working

Question

Can't stream azure tts from server to client side using a pushstream; response on client side is size 0 and the server is working

97 Views Asked by Kaushik Verukonda At 07 January 2024 at 00:51

I am attempting to stream Azure TTS from my server to the client using the Fetch API and a PassThrough push stream. The expected outcome is to receive the stream in chunks. The actual output is a response object with no information. I have tried creating a ReadableStream using Fetch, but when I try to log the response, I get an error message that my response object is size 0. I have also tried to see if anything is getting sent through in chunks, but everything is size 0. I have tried to debug my backend but from everything I can tell, it is working properly. If anyone has solved this issue or has demo code for streaming TTS in JavaScript, please let me know. This is my actual function code. I believe it works.:

const generateSpeechFromText = async (text) => {
  const speechConfig = sdk.SpeechConfig.fromSubscription(
    process.env.SPEECH_KEY,
    process.env.SPEECH_REGION
  );
  speechConfig.speechSynthesisVoiceName = "en-US-JennyNeural";
  speechConfig.speechSynthesisOutputFormat =
    sdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3;

  const synthesizer = new sdk.SpeechSynthesizer(speechConfig);

  return new Promise((resolve, reject) => {
    synthesizer.speakTextAsync(
      text,
      (result) => {
        if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
          const bufferStream = new PassThrough();
          bufferStream.end(Buffer.from(result.audioData));
          resolve(bufferStream);
        } else {
          console.error("Speech synthesis canceled: " + result.errorDetails);
          reject(new Error("Speech synthesis failed"));
        }
        synthesizer.close();
      },
      (error) => {
        console.error("Error in speech synthesis: " + error);
        synthesizer.close();
        reject(error);
      }
    );
  });

This is my index.js route code to send to the frontend. I believe it works but there could be an error.:

app.get("/textToSpeech", async (request, reply) => {
  if (textWorks) {
    try {
      const stream = await generateSpeechFromText(
        textWorks
      
      );
      console.log("Stream created, sending to client: ", stream);
      reply.type("audio/mpeg").send(stream);
    } catch (err) {
      console.error(err);
      reply.status(500).send("Error in text-to-speech synthesis");
    }
  } else {
    reply.status(404).send("OpenAI response not found");
  }
});

This is my frontend client code. I think the error has to do with the response object, but I am not sure.:

// Fetch TTS from Backend
export const fetchTTS = async (): Promise<Blob | null> => {
  try {
    const response = await fetch("http://localhost:3000/textToSpeech", {
      method: "GET",
    });
    // the response is size 0 and has no information
    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }
    
    const body = response.body;
    console.log("Body", body);
    if (!body) {
      console.error("Response body is not a readable stream.");
      return null;
    }

    const reader = body.getReader();
    let chunks: Uint8Array[] = [];

    const read = async () => {
      const { done, value } = await reader.read();
      if (done) {
        return;
      }

      if (value) {
        chunks.push(value);
      }
      await read();
    };
    console.log("Chunks", chunks);

    await read();

    const audioBlob = new Blob(chunks, { type: "audio/mpeg" });

    // console.log("Audio Blob: ", audioBlob);
    // console.log("Audio Blob Size: ", audioBlob.size);

    return audioBlob.size > 0 ? audioBlob : null;
  } catch (error) {
    console.error("Error fetching text-to-speech audio:", error);
    return null;
  }
};

I have tried reading it into a blob directly afterwards I attempted to create a ReadableStream object using the FetchApi, this is when I realized that response object was size 0. I have console logged and attempted to debug the server side code and based on the console log statements the server side code is working as intended. It is breaking the audio into chunks and pushing the chunks into the client side.

Original Q&A

There are 1 best solutions below

**Sampath** · Answer 1 · 2024-01-08T13:04:21.503000

The simple code below is an Express.js server that uses the Microsoft Azure Cognitive Services Speech SDK to convert text to speech.

Server-side:

const express = require('express');
const sdk = require("microsoft-cognitiveservices-speech-sdk");
const { PassThrough } = require('stream');
const app = express();

const generateSpeechFromText = async (text) => {
  const speechConfig = sdk.SpeechConfig.fromSubscription(
    process.env.SPEECH_KEY,
    process.env.SPEECH_REGION
  );
  speechConfig.speechSynthesisVoiceName = "en-US-JennyNeural";
  speechConfig.speechSynthesisOutputFormat =
    sdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3;

  const synthesizer = new sdk.SpeechSynthesizer(speechConfig);

  return new Promise((resolve, reject) => {
    synthesizer.speakTextAsync(
      text,
      (result) => {
        if (result.reason === sdk.ResultReason.SynthesizingAudioCompleted) {
          const bufferStream = new PassThrough();
          bufferStream.end(Buffer.from(result.audioData));
          resolve(bufferStream);
        } else {
          console.error("Speech synthesis canceled: " + result.errorDetails);
          reject(new Error("Speech synthesis failed"));
        }
        synthesizer.close();
      },
      (error) => {
        console.error("Error in speech synthesis: " + error);
        synthesizer.close();
        reject(error);
      }
    );
  });
};

app.get("/textToSpeech", async (request, reply) => {
  const textWorks = "Your desired text goes here"; // Replace with your text
  if (textWorks) {
    try {
      const stream = await generateSpeechFromText(textWorks);
      reply.type("audio/mpeg");
      stream.pipe(reply); // Send the stream directly without attempting JSON conversion
    } catch (err) {
      console.error(err);
      reply.status(500).send("Error in text-to-speech synthesis");
    }
  } else {
    reply.status(404).send("Text not provided");
  }
});

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
  console.log(`Server is running on port ${PORT}`);
});

Client-side

// Fetch TTS from Backend
export const fetchTTS = async () => {
  try {
    const response = await fetch("http://localhost:3000/textToSpeech", {
      method: "GET",
    });

    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }

    const audioBuffer = await response.arrayBuffer();
    
    if (audioBuffer.byteLength > 0) {
      const audioContext = new (window.AudioContext || window.webkitAudioContext)();
      const audioBlob = new Blob([audioBuffer], { type: "audio/mpeg" });
      const audioUrl = URL.createObjectURL(audioBlob);
      const audio = new Audio(audioUrl);

      // Play the fetched audio
      audio.play();

      console.log("Fetched audio response:", audioBlob); // Print the audio response
      
      return audioBlob;
    } else {
      console.error("Empty audio response");
      return null;
    }
  } catch (error) {
    console.error("Error fetching text-to-speech audio:", error);
    return null;
  }
};

enter image description here

Client-side (or)

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Text-to-Speech Client</title>
</head>
<body>

  <h1>Text-to-Speech Client</h1>

  <label for="textInput">Enter Text:</label>
  <textarea id="textInput" rows="4" cols="50" placeholder="Enter your text here"></textarea>

  <button id="convertButton">Convert to Speech</button>

  <audio id="audioPlayer" controls style="display:none;"></audio>

  <script>
    document.addEventListener('DOMContentLoaded', function () {
      const convertButton = document.getElementById('convertButton');
      const textInput = document.getElementById('textInput');
      const audioPlayer = document.getElementById('audioPlayer');

      convertButton.addEventListener('click', async function () {
        const text = textInput.value.trim();

        if (text) {
          try {
            const response = await fetch(`/textToSpeech?text=${encodeURIComponent(text)}`);
            const audioBuffer = await response.arrayBuffer();
            const audioBlob = new Blob([audioBuffer], { type: 'audio/mpeg' });
            const audioUrl = URL.createObjectURL(audioBlob);

            audioPlayer.src = audioUrl;
            audioPlayer.style.display = 'block';
          } catch (error) {
            console.error('Error in text-to-speech request:', error);
          }
        } else {
          alert('Please enter text before converting to speech.');
        }
      });
    });
  </script>

</body>
</html>

For TTS you can manually compile OpenSSL 1.1.1 yourself by following the instructions here

Can't stream azure tts from server to client side using a pushstream; response on client side is size 0 and the server is working

There are 1 best solutions below

Related Questions in JAVASCRIPT

Related Questions in STREAMING

Related Questions in TEXT-TO-SPEECH

Related Questions in AZURE-SPEECH

Trending Questions

Popular # Hahtags

Popular Questions