Can speech diarization be be integrated with deepspeech?

315 Views Asked by At

In an online meeting such as Google Meet/ Zoom, I want to detect change of speaker and then transcribe the audio for different speakers.

I am using Deepspeech model for speech to text. I have fine-tuned the model for Indian accent english but I want to add speech diarization feature in this. Is there a way to do the same? I don't want to identify the user by name, just want to find part of audios spoken by different speakers.

1

There are 1 best solutions below

0
On

DeepSpeech does not include any functionality for speaker recognition, and you would have to change the model architecture significantly and re-train a model for speaker recognition capabilities.

You may wish to look at Whisper from OpenAI - which is an end to end model train for several tasks at once, including speaker recognition.

https://openai.com/blog/whisper/