AWS Sumerian host: https://github.com/aws-samples/amazon-sumerian-hosts
The example shown seems to be tightly integrated with Amazon Polly (both ThreeJS and BabylonJS versions). Is there a way to use the 3D assets provided, but use a self-hosted service for Text-To-Speech (TTS) without losing visual syncing functionality of the 3D avatar such as lip sync animation.
I am thinking of using open source tools like Mimic3 or Web Speech API for TTS to avoid incurring cost in AWS. However, AWS Polly seems to include things like Speechmarks which helps to sync audio with the animation of the 3D avatar. Is there a away to replicate these functionalities in other TTS tools such that it can be compatible with AWS Sumerian/Any other workarounds? If yes, what are some steps I should take? Thanks.
Without AWS Polly, you need to find an alternative text-to-speech library and integrate it yourself.
Look here for the example of Amazon's Sumerian demo modified to use Azure text-to-speech instead:
Make a realtime realistic 3D avatar with text-to-speech, Viseme Lip-sync, and emotions/gestures