I am developing an Android app that incorporates Text-to-Speech (TTS) functionality. In my application, users often do not have all the necessary TTS language files downloaded. Currently, I use the ACTION_INSTALL_TTS_DATA intent to open a system window, prompting the user to choose and download the required language files.
However, I would like to provide a more seamless user experience by avoiding the redirection to the system window and initiating a download process directly within my app. My initial thought was to store TTS language files on an S3 bucket and then download and preload them into the TTS engine on the user's device.
Is there a workaround to pre-load TTS languages without using the ACTION_INSTALL_TTS_DATA intent, allowing users to stay within the app rather than being redirected to the system window? If so, how can I achieve this or are there alternative approaches that could address this use case?
Current code:
var tts: TextToSpeech? = null
tts = TextToSpeech(this) { status ->
if (status == TextToSpeech.SUCCESS) {
val availableLanguages = tts!!.availableLanguages
if (!availableLanguages.contains(Locale.forLanguageTag("ro"))) {
val intent = Intent(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA)
startActivity(intent)
}
}
}
The main issue here is that on Android, there is unfortunately no way to know in advance what actual TTS engine the end user is going to have installed... and the behavior varies among these engines (and even between sub-versions of a specific engine). Certain engines may not even support certain languages at all. You can query for what engine is installed/active at runtime, but then you would have to adapt your approach depending on the engine, an even then you can't predict what engines will exist in the future.
Some engines, like Google, I believe, may download a language automatically when you attempt to use an uninstalled language... but you're still going to get a delay during that time and you'd have to figure out how to handle that in your app.
Basically, in my experience, you have these choices in order from best to worst IMO:
This removes the unpredictability factor and desired voices are always available. Only issues are that the user will need internet, you may have to pay.
Just go with what you have now, which falls correctly into the way Android was designed and it's not your fault if it's slightly inconvenient.
Use static TTS library in your app (not exactly straight forward)
This means you're not actually using the external Android TTS "service" anymore, and the TTS is hardcoded into your app. This is not really feasible as far as I know without using very outdated TTS engines that dont sound very good, and would be a very large undertaking. Maybe something newer exists nowadays, though, perhaps using on-board AI or something.
Decide if you are ok with forcing the user to install the latest Google TTS engine before your app will run. This would narrow things down to a more predictable scenario, but it's still not certain if can download voices without diverting the user.
Try to accommodate all possible user configurations (not feasible).