In an online meeting such as Google Meet/ Zoom, I want to detect change of speaker and then transcribe the audio for different speakers.
I am using Deepspeech model for speech to text. I have fine-tuned the model for Indian accent english but I want to add speech diarization feature in this. Is there a way to do the same? I don't want to identify the user by name, just want to find part of audios spoken by different speakers.
DeepSpeech does not include any functionality for speaker recognition, and you would have to change the model architecture significantly and re-train a model for speaker recognition capabilities.
You may wish to look at Whisper from OpenAI - which is an end to end model train for several tasks at once, including speaker recognition.
https://openai.com/blog/whisper/