Whisper neural network how to set up two-channel recognition?

60 Views Asked by Дмитрий At 16 January 2024 at 14:48

I have a question about the whisper neural network. I have two-channel recordings of phone calls. How to decrypt the file .wav to a text file with the designation, where does the interlocutor speak? For example: the operator ... client...

I tried to work separately with each channel. I wrote them into two files and combined them. I wanted to know if there is a simpler solution

Original Q&A

There are 1 best solutions below

greenw0lf On 23 January 2024 at 13:11

If I understood correctly, you want to also add information about which interlocutor says what, right? From what I know, it seems that the way you approached it is the simplest one (https://github.com/openai/whisper/discussions/1026).

You could also merge the two channels into one (make the audio mono), but then you might get issues with speakers overlapping each other sometimes. If you do want to do it like this, then you could output timestamps as well aside from just the text. Then, if you know when each interlocutor speaks in the recording, you can match the outputted timestamps with the corresponding speaker. You can use this Whisper implementation if you want word-level timestamps: https://github.com/linto-ai/whisper-timestamped. Whisper outputs segment-level timestamps already.

I also just found out that the native implementation of Whisper has support for word level timestamps (if you add word_timestamps=True in the .transcribe() command, see https://github.com/openai/whisper/blob/main/whisper/transcribe.py)

Hope this helps!

Whisper neural network how to set up two-channel recognition?

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in OPENAI-WHISPER

Related Questions in WHISPER

Trending Questions

Popular # Hahtags

Popular Questions