I am trying to port a TTS app that utilizes in-text control tags from desktop/web/iOS to Android. The app makes a text file consisting of the text to be spoken and silent periods between the spoken words. Silent periods are represented with in-text control tags such as SAPI TTS <silence msec="1000"/> tag or iOS TTS engine in-text control tag for silence [[slnc 10000]]
The text sent to the SAPI TTS speech synthesizer looks like this:
Text one <silence msec="750"/> text two <silence msec="1000"/> text three <silence msec="500"/> Text four <silence msec="600"/> Text five.....
Similarly for iOS TTS the in-text control tag for silence is [[slnc 10000]] and the text to be sent to the speech synthesizer looks like this:
Text one [[slnc 750]] text two [[slnc 10000]] text three [[slnc 500]] text four [[slnc 600]] text five......
Android TTS doesn't seem to use in-text control tags for the speech synthesizer. Also the following two variants of the speech() method use google web service so to achieve accurate timing of the spoken text coming back from the speech synthesizer server and the timing of the silence periods within the code may be impossible or unreliable at best.
speak(speech, TextToSpeech.QUEUE_FLUSH, null);
speak(speech, TextToSpeech.ADD_ADD, null);
I welcome any Android solution that focuses on preserving accurate timing of silence periods between spoken words.
The Android TTS engine has the deprecated
playSilence()and the newerplaySilentUtterance()methods that can be used to pause the speech output for a given amount of time.If the app targets API level 21 i.e. Android 5.0 as the minimum, then
playSilentUtterance()should be used. Otherwise the deprecatedplaySilence()is still available.The complete method signature of the
playSilentUtterancemethod is:Here
durationInMsis the duration of the silence in milliseconds.The
queueModecan be eitherQUEUE_ADDwhich means that the silence is played after the TTS engine has finished what it is currently speaking and what was already added to the queue andQUEUE_FLUSHstops everything first and clears the queue, so the silence is played right away.Finally the
utteranceIdis an optional unique identifier for the text (or in this case silence) to be spoken and is useful if using an UtteranceProgressListener.