Vosk's accuracy in identifying the content of recording files is quite low

47 Views Asked by At

I want to use Java to realize the local recording file recognition function. I used Vosk and tested the large Chinese model and the large Arabic model. When using the Chinese model to identify recording files with Chinese content, some of the results obtained will be garbled; when using the Arabic model to identify The Arabic recording files have very little content, and some recording files cannot even recognize the content.

The following is the Java code I used

private static Float getSampleRate(File file) throws Exception {
    WavFileReader fileReader = new WavFileReader();
    AudioFile audioFile = fileReader.read(file);
    String sampleRate = audioFile.getAudioHeader().getSampleRate();
    return Float.parseFloat(sampleRate);
}

public static void processFile(File file) {
    try {
        Model model = new Model("F:\\vosk\\vosk-model-ar-mgb2-0.4");//vosk-model-cn-0.22
        InputStream ais = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream(file)));
        Recognizer recognizer = new Recognizer(model, getSampleRate(file));
        int bytes;
        byte[] b = new byte[40960];
        while ((bytes = ais.read(b)) >= 0) {
            recognizer.acceptWaveForm(b, bytes);
        }
//        System.out.println(new String(recognizer.getFinalResult().getBytes("GBK"),"UTF-8"));
        System.out.println(recognizer.getFinalResult());
    } catch (Exception e) {
        e.printStackTrace();
    }
}

public static void main(String[] args) {
    LibVosk.setLogLevel(LogLevel.DEBUG);
    File file3 = new File("F:\\KuGou\\2_out.wav");
    VoskWavRecognition.processFile(file3);
}

I want to improve the accuracy of identifying content so that Chinese recording files will not be garbled.

0

There are 0 best solutions below