how to get the details of ASR VOSK

325 Views Asked by At

I have working with Vosk and I need to get the time of each word in my file.mp3 this is my code

def voice_recognition(filename):
    model = Model(model_name="vosk-model-fa-0.5")
    rec = KaldiRecognizer(model, FRAME_RATE)
    rec.SetWords(True)

    mp3 = AudioSegment.from_mp3(filename)
    mp3 = mp3.set_channels(CHANNELS)
    mp3 = mp3.set_frame_rate(FRAME_RATE)

    step = 45000
    transcript = ""
    for i in range(0, len(mp3), step):
        segment = mp3[i:i+step]
        rec.AcceptWaveform(segment.raw_data)
        result = rec.Result()
        text = json.loads(result)["text"]
        transcript += text
    return transcript

I need something like this

time               word
-----------------------
(0.0.01, 0.0.2)    hi
(0.0.03, 0.0.4)    how
(0.0.04, 0.0.5)    are
(0.0.05, 0.0.6)    you

is there any way get the data like this?

1

There are 1 best solutions below

0
On

I just found all I need are already there when you set the rec.SetWords(True) all the details are in result = rec.Result()