I have working with Vosk and I need to get the time of each word in my file.mp3
this is my code
def voice_recognition(filename):
model = Model(model_name="vosk-model-fa-0.5")
rec = KaldiRecognizer(model, FRAME_RATE)
rec.SetWords(True)
mp3 = AudioSegment.from_mp3(filename)
mp3 = mp3.set_channels(CHANNELS)
mp3 = mp3.set_frame_rate(FRAME_RATE)
step = 45000
transcript = ""
for i in range(0, len(mp3), step):
segment = mp3[i:i+step]
rec.AcceptWaveform(segment.raw_data)
result = rec.Result()
text = json.loads(result)["text"]
transcript += text
return transcript
I need something like this
time word
-----------------------
(0.0.01, 0.0.2) hi
(0.0.03, 0.0.4) how
(0.0.04, 0.0.5) are
(0.0.05, 0.0.6) you
is there any way get the data like this?
I just found all I need are already there when you set the
rec.SetWords(True)
all the details are inresult = rec.Result()