Is there a way to convert speech directly into SSML?

2.4k Views Asked by At

Just as one is able to use various speech-to-text 'dictation' tools to convert spoken word into its corresponding text, I would like to know if there are similar such tools for converting spoken word into its corresponding SSML. That is, it will provide the text in addition to the relevant SSML tags associated with any intonation, prosody, pauses/breaks, inflection, etc... present in the speaker's voice.

1

There are 1 best solutions below

1
On

I work on building Voice apps. In a recent project I was working on, we needed the text to sound exactly right, with all the associated intonations, prosody, pauses/breaks, inflection, etc. On extensive research, we found that the only way to make the text sound like being spoken by a real person is either to use SSML (still not perfect) or a recorded mp3.

If you're trying to get the real person feel for a project, the best way to execute it is to utilize a human. I would suggest you record the mp3 (/get it recorded by a professional) instead of trying to get SSML from voice.

The reason we use SSML is exactly that computers cannot understand the associated intonations, prosody, pauses/breaks, inflection, etc. of human speech.

If your goal is to get SSML, then the best way would be to convert text to SSML. For this, I'd suggest taking a peek here:

W3C SSML

Google SSML

Amazon SSML

This is to the best of our knowledge @ mid July 2018. If anyone has more info please feel to add to this answer.

Hope this helps :3