IBM Cloud - How to adjust speaking rate in Watson TTS using curl POST?

518 Views Asked by At

I'm having issues trying to adjust the prosody speaking rate in IBM Watson's TTS Service using curl. Here is the code I've tried, it does synthesize audio but just completely ignores the --header "prosody rate: +50%" ^ line I inserted which was to be expected as I'm unsure how to make that happen and just improvised that. Does anyone know how I could get it to work as intended? I want to speed it up by 50%, but I can't find anything in the docs to help me when it comes to this request format.

Thanks!

curl -X POST -u "apikey:apikey" ^
--header "Content-Type: application/json" ^
--header "Accept: audio/wav" ^
--header "prosody rate: +50%" ^
--data "{\"text\":\"Adult capybaras are one meter long.\"}" ^
--output hello_world.wav ^
"URL/v1/synthesize?voice=en-US_HenryV3Voice"

2

There are 2 best solutions below

1
On

prosody is an SSML option, so I would expect it to be used as tags around the text that you are synthesising.

--data "{\"text\":\"<prosody rate = \"fast\">Adult capybaras are one meter long.</prosody>\"}" 

6
On

Here's a working example with the POST call,

curl -X POST -u "apikey:{API_KEY}" \
--header "Accept: audio/wav" \
--header "Content-Type: application/json" \
--data '{"text": "<p><s><prosody rate=\"+50%\">This is the first sentence of the paragraph.</prosody></s><s>Here is another sentence.</s><s>Finally, this is the last sentence.</s></p>"}' \
--output result.wav \
"{URL}/v1/synthesize" -v

on a Windows command prompt(cmd),

Create a JSON file input.json with the below command

echo {"text": "<p><s><prosody rate='+50%'>This is the first sentence of the paragraph.</prosody></s><s>Here is another sentence.</s><s>Finally, this is the last sentence.</s></p>"} > input.json

and then cURL to see result.wav file

curl -X POST -u "apikey:{API_KEY}" ^
--header "Accept: audio/wav" ^
--header "Content-Type: application/json" ^
--data @input.json ^
--output result.wav ^
"{URL}/v1/synthesize" -v

For the sentence in your question, replace the JSON above with yours

{"text":"<prosody rate='fast'>Adult capybaras are one meter long.</prosody>"}

Here's some useful links I followed to create this code sample that will help you in understanding the SSML attributes. Also, check the limitations of <prosody> in the links below