How can I get the start and end times of words in an audio file with a known transcript using Vosk?

1.1k Views Asked by Jason Maldonis At 27 November 2022 at 02:44

I'm using Vosk (https://alphacephei.com/vosk/) in Python and I want to get the start and end times of every word in an audio file, and I have the transcript of the audio file.

I'm using some code I found online to perform speech-to-text using Vosk, and it also gives the start and end times of every word. Unfortunately the transcription isn't perfect.

Since I have the perfect transcript, I want to tell Vosk what the correct transcript is and have it tell me the start and end times of every word. Is this possible?

Here is the code I'm using now:

import wave
import json

from vosk import Model, KaldiRecognizer

model_path = r".\vosk_models\vosk-model-en-us-0.22"
audio_filename = "some_audio_file.wav"

model = Model(model_path)
wf = wave.open(audio_filename, "rb")
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)  # Include the start and end times for each word in the output

# get the list of JSON dictionaries
results = []
# recognize speech using vosk model
while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        part_result = json.loads(rec.Result())
        results.append(part_result)
part_result = json.loads(rec.FinalResult())
results.append(part_result)

wf.close()  # close audiofile

Original Q&A

There are 1 best solutions below

J.M. Robles On 14 January 2023 at 16:58

Perhaps you could make use of sttcast. It uses vosk to transcribe to an HTML file from which you can collect timestamps and text to correct. I think it is possible to automatize the task if you have hundreds of hours of audio, but for only a few hours, you should consider making it manually

How can I get the start and end times of words in an audio file with a known transcript using Vosk?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SPEECH-RECOGNITION

Related Questions in SPEECH-TO-TEXT

Related Questions in VOSK

Trending Questions

Popular # Hahtags

Popular Questions