text to speect AZURE add silence

271 Views Asked by At

Hi I'm trying to add a pause in the text to speech Azure api, this code works fine but I can't add silence. I need to add silence when a specific character is found in the text, I tried with or <mstts:ttsbreak strength="none" /> but the audio output speech the tags instead to make a pause.

       $doc = new DOMDocument();

       $root = $doc->createElement( "speak" );
       $root->setAttribute( "xmlns" , "http://www.w3.org/2001/10/synthesis" );
       $root->setAttribute( "xmlns:mstts" , "http://www.w3.org/2001/mstts" );
       $root->setAttribute( "xmlns:emo" , "http://www.w3.org/2009/10/emotionml" );
       $root->setAttribute( "version" , "1.0" );
       $root->setAttribute( "xml:lang" , "$spechlang" );           

       $voice = $doc->createElement( "voice" );
       $voice->setAttribute( "name" , "$myvoice");

       $style = $doc->createElement( "mstts:express-as" );
       $style->setAttribute( "style" , "whispering"); // 

       $prosody = $doc->createElement( "prosody" );
       $prosody->setAttribute( "rate" , "$rate.00%" ); 
       $prosody->setAttribute( "pitch" , "$pitch.00%" );  

       $text = $doc->createTextNode( "$mytext" );

       $prosody->appendChild( $text );
       $style->appendChild( $prosody );
       $voice->appendChild( $style );
       $root->appendChild( $voice );
       $doc->appendChild( $root );
       $data = $doc->saveXML();

       $options = array(
        'http' => array(
            'header'  => "Content-type: application/ssml+xml\r\n" .
                    "X-Microsoft-OutputFormat: riff-24khz-16bit-mono-pcm\r\n" .
                    "Authorization: "."Bearer ".$access_token."\r\n" .
                    "X-Search-AppId: 07D3234E56TT426DAA29772419F436CA\r\n" .
                    "X-Search-ClientID: 1ECFAE91406677A480F00935DC390960\r\n" .
                    "User-Agent: TTSPHP\r\n" .
                    "content-length: ".strlen($data)."\r\n",
            'method'  => 'POST',
            'content' => $data,
            ),
        );                                                

        $context  = stream_context_create($options);

        // get the wave data
        $result = file_get_contents($ttsServiceUri, false, $context);

a ssml generated could be

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="it-IT">
<voice name="en-US-JennyNeural">
1. first point: <break strength="medium" />
lore lipso bla bla.
</voice></speak>

this SSML work fine on https://speech.microsoft.com/portal SPEECH STUDIO PORTAL but not in my php. thanks

2

There are 2 best solutions below

0
On

I believe break does not use the mstts namespace. However, silence does. Here is the official docs: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-structure#add-silence

0
On

This is a full SSML script with the break element. It does not use the mstts namespace. The time is in milliseconds 1 sec = 1000

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="string" xmlns:mstts="https://www.w3.org/2001/mstts">
  <voice  name="en-AU-NatashaNeural">
I have to think about it. Wait two seconds 
<break  time="2000"/>
Yes I agree 
</voice>
</speak>