How to implement real-time transcription along with speaker Identification for google meet

685 Views Asked by At

I'm working on developing a tool that can automatically join a Google Meet session, record the audio, and generate real-time notes that are aware of who is speaking. The tool should be able to identify speakers and accurately associate their spoken words with their name.

Is there an official Google API available for this purpose, or are there any other recommended approaches to achieve this functionality?

I attempted to implement this functionality using Google Cloud Speech-to-Text, but I found that the service requires the meeting to be pre-recorded before it can transcribe the audio. Additionally, the accuracy of speaker recognition using this service was not satisfactory as we can't get the actual speaker names. I have tried to scrap the google meet captions but it does not seems to be a reliable solution. I want it like the webkitSpeechRecognition but with the identification of speakers.

1

There are 1 best solutions below

0
user2207488 On

Is there an official Google API available for this purpose, or are there any other recommended approaches to achieve this functionality?

Looks like part of the problem might be addressed by this new Google Meet API, though it's still in preview: https://developers.google.com/meet/api/guides/overview