Creating voice command synonyms in GRXML

354 Views Asked by At

I've created a voice controlled UWP application in C++/CX (for Hololens, if that matters). A very simple one, mostly according to some samples, this is the speech recognition event handler:

void MyAppMain::HasSpoken(SpeechContinuousRecognitionSession ^sender, SpeechContinuousRecognitionResultGeneratedEventArgs ^args)
{
    if (args->Result->Confidence == SpeechRecognitionConfidence::Medium
        || args->Result->Confidence == SpeechRecognitionConfidence::High)
    {
        process_voice_command(args->Result->Text);
    }
}

Everything works so far, the recognition result is in args->Result->Text variable. Now, I only need to support a very limited set of voice commands and simply ignore everything else, but within that limited set of commands I want some variability. It seems, the last example on this page is exactly about that. So I made the following grammar file based on that:

<grammar version="1.0" xml:lang="en-US" root="nextCommands" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">

  <rule id="nextCommands">
    <item>
      <one-of>
        <item>next</item>
        <item>go</item>        
        <item>advance</item>
      </one-of>
      <tag>out="next";</tag>
    </item>
  </rule>

</grammar>

What I want with it is that when I say either "next", "go" or "advance", the recognition engine just returns "next", so it is in the args->Result->Text above. What it actually does for me right now is limiting the set of recognized words to those three, but it simply returns the word I say, without converting it to "next". Looks like it either ignores the <tag> element, or I have to retrieve its content in a different way in my C++/CX program. Or <tag> doesn't work the way I think it does. What shall I change to make it work?

2

There are 2 best solutions below

0
On BEST ANSWER

I have found a way to do what I want with SRGS (at least for the very simple case described in the question). So, it seems <tag> doesn't change the recognition result directly (at least, not with tag-format="semantics/1.0", there are other tag-format's, as described, for example, here, they may do something else). Instead, it populates some additional collection of properties. So this is how I changed my code for now:

<grammar version="1.0" xml:lang="en-US" 
root="nextCommands" xmlns="http://www.w3.org/2001/06/grammar" 
tag-format="semantics/1.0">

  <rule id="nextCommands">
    <item>
      <one-of>
        <item>next</item>
        <item>go</item>        
        <item>advance</item>
      </one-of>
      <tag>out.HONEY="bunny";</tag>
    </item>
  </rule>

</grammar>

Now, when either "next", "go" or "advance" is recognized, it still goes to args->Result->Text unchanged, but also there's gonna be a new pair in the args->Result->SemanticInterpretation->Properties with the HONEY key and the bunny value. I can check if that was the case with

args->Result->SemanticInterpretation->Properties->HasKey("HONEY");

and, if so, retrieve the value of it with

args->Result->SemanticInterpretation->Properties->Lookup("HONEY")->GetAt(0); //returns "bunny"
0
On

Or doesn't work the way I think it does

A tag is a legal rule expansion,Tags do not affect the legal word patterns defined by the grammars or the process of recognizing speech or other input given a grammar. Details please check Tags section of Speech Recognition Grammar Specification.

What I want with it is that when I say either "next", "go" or "advance", the recognition engine just returns "next"

Speech recognition converts words spoken by the user into text for form input. Constraints, or grammars, define the spoken words and phrases that can be matched by the speech recognizer. The grammar you used is for defining the match worlds. If you want "next","go" or "advance" to execute the same command, you could deal with them when you handle the text result. For example,

// Start recognition.
Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();
// Do something with the recognition result.
if (speechRecognitionResult.Text == "go" || speechRecognitionResult.Text == "next" || speechRecognitionResult.Text == "advance")
{

}

Details please reference the Scenario_SRGSConstraint of the official sample which contains the method HandleRecognitionResult.