i am trying to do some simple speech recognition (from a .wav file) using Powershell. I am using Microsoft.Speech.Recognition.SpeechRecognitionEngine
. Sadly i have some serious problems with it, but first off here is my code:
[System.Reflection.Assembly]::LoadFrom("C:\Program Files\Microsoft SDKs\Speech\v11.0\Assembly\Microsoft.Speech.dll")
[System.Reflection.Assembly]::LoadWithPartialName("System.Speech")
$cult = New-Object System.Globalization.CultureInfo("en-US")
$listener = New-Object Microsoft.Speech.Recognition.SpeechRecognitionEngine($cult)
$listener.SetInputToWaveFile("C:\Users\user\Downloads\audio.wav")
$arr = @("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q" ,"r", "s", "t", "u","v","w","x","y","z","four","red")
$text = New-Object Microsoft.Speech.Recognition.Choices
$text.Add($arr)
$toGram = New-Object Microsoft.Speech.Recognition.GrammarBuilder($text)
$toGram.Culture = $cult
$gram = New-Object Microsoft.Speech.Recognition.Grammar($toGram)
$listener.LoadGrammar($gram)
Register-ObjectEvent $listener RecognizeCompleted -SourceIdentifier "RecognizeCompleted" -Action {if($EventArgs){$EventArgs.Result.Text; write-host $EventArgs.Result.Confidence} else {write-host "nope"} }
$listener.RecognizeAsync()
My problem is that when i use .Recognize()
i get no output at all, not even output with 0 results.
When registering for the completion of the Async method (.RecognizeAsync()
) the Handler gets called and $EventArgs
does exist but i can not access any Properties of the variable or even get output from Get-Member
.
Am i doing something obviously wrong here? I would appreciate any input as i´m kind of going mad right now...
I would also be open for any alternatives to the MS Speech API (any command line tool that can do basic speech recognition in english would do).
Update: the wave file contains a series of letters or numbers. For example " 3 D 6 H Y"
Update: i appreciate edits but i dont appreciate someone removing code! Thanks! Dont do it!
Update: it seems SAPI doesnt handle single characters very well (if anyhow). I´ll probably try sphinx next. Thanks though to brandon for investing so much time to help me.
This is from my removed comment as it's part of the answer:
Recognize()
is blocking. It's doing one single recognition action each call the way you have it now. I don't have any experience with Powershell so correct me if I'm wrong, but it looks like you'd have call that function or procedure or script etc... for every time you want a recognition.Basically: If it hears "A", that's it; You have to call
Recognize
again to get "B". Try it with a microphone (SetInputToDefaultAudioDevice
). Lastly,Recognize[Async]()
raises theSpeechRecognized
event, where you retrieve results, which it doesn't look like you handle.You'll probably want to call
RecognizeAsync
instead, so the engine can handle more than one bit of spoken text in the same action. It can be done both ways however.Again, because I'm not familiar with Powershell, here's some pseudo/c# code to get you on the right track:
Recognize()
method:RecognizeAsync()
method:Here's a link to the
RecognizeAsync()
MSDN doc, which will show you the events raised by theRecognize
family.http://msdn.microsoft.com/en-us/library/system.speech.recognition.speechrecognitionengine.recognizeasync%28v=vs.110%29.aspx