Google Text-To-Speech, Twilio, MULAW, PHP, Remove WAV Header

46 Views Asked by At

I am trying to get audio from Google Text-To-Speech to Twilio MULAW/8000.

It is all working fine apart from a click when the audio starts playing.

I see that Google sends a WAV header with the MULAW base64. How can I remove the WAV header from the payload. My code is below.

This plays / works fine, I just want to remove the click sound when the audio starts playing. I think removing the WAV header should do the trick.

use Google\Cloud\TextToSpeech\V1\AudioConfig;
use Google\Cloud\TextToSpeech\V1\AudioEncoding;
use Google\Cloud\TextToSpeech\V1\SynthesisInput;
use Google\Cloud\TextToSpeech\V1\TextToSpeechClient;
use Google\Cloud\TextToSpeech\V1\VoiceSelectionParams;

$textToSpeechClient = new TextToSpeechClient();

$input = new SynthesisInput();
$input->setText('This is my test audio');
$voice = new VoiceSelectionParams();
$voice->setLanguageCode('en-US');
$audioConfig = new AudioConfig();
$audioConfig->setAudioEncoding(AudioEncoding::MULAW);
$audioConfig->setSampleRateHertz(8000);

$resp = $textToSpeechClient->synthesizeSpeech($input, $voice, $audioConfig);

$audioContent = $resp->getAudioContent();
$TheBase64Audio = base64_encode($audioContent);
1

There are 1 best solutions below

0
Mel On

As mentioned in the "WAVE File Format Analysis" of this article:

The telephony standard for audio is 8-bit PCM mono uLaw (MULAW) with a sampling rate of 8Khz. The payload of the media message should not contain the audio file type header bytes. So it's essential to understand the WAV file header fields so that you can strip them off before sending the audio data to the user.

A standard WAV file header comprises the following fields:

Positions Sample Value Description
1 - 4 “RIFF” Marks the file as a riff file. Characters are each 1 byte long.
5 - 8 File size (integer) Size of the overall file - 8 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation.
9 -12 “WAVE” File Type Header. For our purposes, it always equals “WAVE”.
13-16 “fmt " Format chunk marker. Includes trailing null
17-20 16 Length of format data as listed above
21-22 1 Type of format (1 is PCM) - 2 byte integer
23-24 2 Number of Channels - 2 byte integer
25-28 44100 Sample Rate - 32 byte integer. Common values are 44100 (CD), 48000 (DAT). Sample Rate = Number of Samples per second, or Hertz.
29-32 176400 (Sample Rate * BitsPerSample * Channels) / 8.
33-34 4 (BitsPerSample * Channels) / 8.1 - 8 bit mono2 - 8 bit stereo/16 bit mono4 - 16 bit stereo
35-36 16 Bits per sample
37-40 “data” “data” chunk header. Marks the beginning of the data section.
41-44 File size (data) Size of the data section.

From this table, it is identified that the WAV header is 44 bytes long, you may skip or remove the first 44 to omit it.