Sphinx4 performance problems while multithreading

295 Views Asked by At

I took the Sphinx4 HelloWorld example and made my own grammar file with sentences like "what is a virus" or "what is application software", simple JSGF stuff, I did tag every sentence separatly as in:

public <0> =     What is number twelve | 
                 What is twelve;
public <1> =     What is the title bar;
public <2> =     What is control | 
                 What is the control key;

No n-gram since I don't fully understand it and I'm not sure if it applies to such a simple example (or rather I think it doesn't). Anyway the code is just a copy paste from HelloWorld.java and the recognition worked pretty well, I'd say it was about 90% accurate.

Now I took that code and put it in a Runnable, started a new thread and suddenly the recognition is horrendous, at about 10% (1 in 10 are correct).

Obviously I capture the sound with my microphone (built-in laptop mic) straight in the application, and I've seen some advices that the sound should be resampled depending on which dictionary I use (which is the standard WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz), so my first question is: does the built-in microphone.startRecording() method do any of that? - the reason for this question is that the HelloWorld running on the main thread doesn't seem to need resampling?

My second question would be am I right in thinking that multithreading decreases the performance significantly? and if yes is there a way to fix that without some huge overhaul in the code?

For the record I ask because I am writing a simple Jeopardy-alike game with speech recognition in Java using SWT and Sphinx4, the main app runs on main thread and the recognition on another. I currently use the ZipCity example way of recognition with listeners but it works horrendously even if it runs on the main thread so I'll be jumping to the simpler way of recognition and that is why I did the HelloWorld test.

EDIT: I forgot to mention I usually get empty result text in the bad accuracy example

Here is the code although it's exactly the same thing as in the example:

The good working one:

public class main_class {

public static void main(String[] args) {
    ConfigurationManager cm;

    cm = new ConfigurationManager(main_class.class.getResource("/jsapi_pr/res/sapi.config.xml"));

    Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
    recognizer.allocate();

    // start the microphone or exit if the program if this is not possible
    Microphone microphone = (Microphone) cm.lookup("microphone");
    if (!microphone.startRecording()) {
        System.out.println("Cannot start microphone.");
        recognizer.deallocate();
        System.exit(1);
    }

    // loop the recognition until the program exits.
    while (true) {

        System.out.println(recognizer.getState());

        Result result = recognizer.recognize();

        if (result != null) {
            String resultText = result.getBestFinalResultNoFiller();
            if(!resultText.isEmpty()) {
                System.out.println("You said: " + resultText + '\n');
            }
        } else {
            System.out.println("I can't hear what you said.\n");
        }
    }
}

The bad working one:

public class main_class {

    public static void main(String[] args) {
        runnable_test test = new runnable_test();
        test.begin();
    }
}

public class runnable_test implements Runnable {

    @Override
    public void run() {

        ConfigurationManager cm;

        cm = new ConfigurationManager(main_class.class.getResource("/jsapi_pr/res/sapi.config.xml"));

        Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
        recognizer.allocate();

        // start the microphone or exit if the program if this is not possible
        Microphone microphone = (Microphone) cm.lookup("microphone");
        if (!microphone.startRecording()) {
            System.out.println("Cannot start microphone.");
            recognizer.deallocate();
            System.exit(1);
        }

        // loop the recognition until the program exits.
        while (true) {

            System.out.println(recognizer.getState());

            Result result = recognizer.recognize();

            if (result != null) {
                String resultText = result.getBestFinalResultNoFiller();
                if(!resultText.isEmpty()) {
                    System.out.println("You said: " + resultText + '\n');
                }
            } else {
                 System.out.println("I can't hear what you said.\n");
            }
        }
    }

    public void begin() {
        Thread thread = new Thread(this);
        thread.start();
    }
}

I'll try and post some results soon, but as said the first one works ok, the second one usually triggers resultText.isEmpty(), and even if it "recognizes" something it's usually wrong.

EDIT2: I boosted up my microphone performance and volume and it is working way better, it still boggles my mind why this happens though, because as I said the results without boosting my microphone are still very good while running in main thread.

The performance of the main application is way better too, going from 2 in 12 to 6 in 12.

0

There are 0 best solutions below