I'm using System.Speech to recognize some phrases or words. One of them is Set timer. I would like to expand this to Set timer for X seconds, and having the code set a timer for X seconds. Is this possible? I have little to no experience with this so far, all I could find is that I have to do something with the grammar class.
Right now I have set up my recognition engine like this:
SpeechRecognitionEngine = new SpeechRecognitionEngine();
SpeechRecognitionEngine.SetInputToDefaultAudioDevice();
var choices = new Choices();
choices.Add("Set timer");
var gb = new GrammarBuilder();
gb.Append(choices);
var g = new Grammar(gb);
SpeechRecognitionEngine.LoadGrammarAsync(g);
SpeechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
SpeechRecognitionEngine.SpeechRecognized += OnSpeechRecognized;
Is there a way to do this?
First, there is no built-in concept of number. Speech is just sequence of words, and if you need to recognize numbers - you need to recognize words which mean numbers, such as "one" and "fifteen". Some numbers are represented by multiple words, such as "one hundred" or "fifty one" - you need to recognize them too.
You can start with just recognizing numbers from 1 to 9:
So our grammar can be read as:
We use
SemanticResultValueto assign a tag to specific phrase. In this case that tag is number (1,2,3...) corresponding to specific word ("one", "two", "three"). By doing that - you can extract that value from recognition result:This is already working example which will recognize your phrases like "set timer for five seconds" and allow you to extract semantic value (5) from them.
Now you could combine various number words together, for example:
But it gets tricky to correctly assign semantic values to them, because this api with
GrammarBuilderis not powerful enough to do that.When what you want to do cannot be (easily) done with pure
GrammarBuilderand related classes - you have to use more powerful xml files, with syntax defined in this specification.Description of those grammar files are out of scope for this question, but fortunately for your task there is already grammar file provided in Microsoft Speech SDK which you probably already downloaded and installed. So, copy file from "C:\Program Files\Microsoft SDKs\Speech\v11.0\Samples\Sample Grammars\en-US.grxml" (or wherever you installed SDK) and remove some irrelevant things, such as first
<tag>element with large CDATA inside.Rule of interest in this file is named "Cardinal" and allows to recognize numbers from 0 to 1 million. Then our code becomes:
And handler becomes:
Now you can regognize numbers up to 1 million.
Of course it's not necessary to define rule in code like we did above - you can define all your rules completely in xml, and then just load it as
SrgsDocumentand create aGrammarfrom it.If you want to recognize multiple commands - here is a sample:
And handler becomes:
For completenes - here is how you can do the same with pure xml. Open that "en-US-sample.grxml" file with xml editor and add rules we defined above in code. They will look like this:
Now set root rule at root grammar tag:
And save.
Now we don't need to define anything at all in code, all we need to do is load our grammar file:
That's all. Because "Timers" rule is root rule in grammar file - it will be used in recognition, and will behave exactly the same as version we defined in code.