Using a file very similar to test_ffmpeg.py in the Vosk repository, I am exploring what text information I can get out of the audio file.
Here is the code of the whole script I'm using.
#!/usr/bin/env python3
from vosk import Model, KaldiRecognizer, SetLogLevel
import sys
import os
import wave
import subprocess
import json
SetLogLevel(0)
if not os.path.exists("model"):
print ("Please download the model from https://alphacephei.com/vosk/models and unpack as 'model' in the current folder.")
exit (1)
sample_rate=16000
model = Model("model")
rec = KaldiRecognizer(model, sample_rate)
process = subprocess.Popen(['ffmpeg', '-loglevel', 'quiet', '-i',
sys.argv[1],
'-ar', str(sample_rate) , '-ac', '1', '-f', 's16le', '-'],
stdout=subprocess.PIPE)
file = open(sys.argv[1]+".txt","w+")
while True:
data = process.stdout.read(4000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
file.write(json.loads(rec.Result())['text']+"\n\n")
#print(rec.Result())
#else:
#print(rec.PartialResult())
#print(json.loads(rec.Result())['text'])
file.write(json.loads(rec.Result())['text'])
file.close()
This example works well, however, the only return I can find out of rec.PartialResult() and rec.Result() is a string dictionary with the result. Is there a way to query the KaldiRecognizer on the timing individual words were found within the audio file?
As I'm typing this, I'm already thinking that elaborating on the result, and detecting changes in the partial result compared with the current samples will get me what I want, but I'm sticking this up here just in case it's already implemented.
After some testing, it was pretty clear the output of ffmpeg seemed stable enough against the defined sample rate (16000), and the read bytes of 4000 turned out to be 8th's of a second. I created a counter in the while loop and divided it by a constant based on the sample rate. If you change the parameters to ffmpeg, it will probably throw this off.
I used some very stone age string comparison to only print when the partial result changes, and only contain the new characters added.