We have a use case to perform voice morphing to call recordings so the customer/agent cannot be recognized. These morphed call recordings will be used for training purposes. I am looking for a simpler and cheaper solution and wanted to know, if any of you have implemented this usecase and can share you experience?
Here are a few ideas I got from people around me
- Convert Original recording to Text using AWS transcribe (voice → text), and then convert it back to a different voice using AWS Polly (text → new voice). This may kill the tempo or intonations in the original agent voice.
- Third party libraries that we can license (I am not aware of any)