I am following the Google Cloud API Text-to-Speech Python tutorial. I would like to know if there is a way to return the phonemes and their duration, an intermediate step in generating the interpreted speech. Is that possible? If so, can you please refer me to the documentation and hopefully some sample code that does this. I searched and could not find anything that already answered my question.
Thanks! gma
Mentioning all the steps to get phonemes from Google cloud API Text-to-Speech. In Part-3, you can find the sample code. Here are the steps you can follow:
[Part-1]
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project
Enable the Cloud Text-to-Speech API.
Create a service account: a. In the Cloud Console, go to the Create service account page. b. Select a project. c. In the Service account name field, enter a name. The Cloud Console fills in the Service account ID field based on this name. d. Click Done to finish creating the service account. Do not close your browser window. You will use it in the next step.
Create a service account key: a. In the Cloud Console, click the email address for the service account that you created. b. Click Keys. c. Click Add key, then click Create new key. d. Click Create. A JSON key file is downloaded to your computer. e. Click Close.
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.
Example 1. Linux or macOS
export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"Replace KEY_PATH with the path of the JSON file that contains your service account key.
For example:-
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"Example 2. Windows
For powershell:
$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"Replace KEY_PATH with the path of the JSON file that contains your service account key.
For example:
$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"For command promt:
set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATHReplace KEY_PATH with the path of the JSON file that contains your service account key.
Install and initialize the cloud SDK.
[Part-2]
Install the client library
pip install --upgrade google-cloud-texttospeech[Part-3]
Create audio data
Now you can use Text-to-Speech to create an audio file of synthetic human speech. Use the following code to send a synthesize request to the Text-to-Speech API.
If you face any issue, please refer to the link below:
https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries#client-libraries-install-python