How to Implement SFSpeechRecognizationTaskDelegates

429 Views Asked by At

I am trying to implement a speech to text application,i am able to record the audio from Microphone by using SFSpeechRecognizer .The use case is as soon as the user stop speaking ,a method should invoke and stop the recording automatically .Would you be able to help me the use case.

Please find the below code

func startRecording() {
    
    // Clear all previous session data and cancel task
    if recognitionTask != nil {
        recognitionTask?.cancel()
        recognitionTask = nil
    }

    // Create instance of audio session to record voice
    let audioSession = AVAudioSession.sharedInstance()
    do {
        try audioSession.setCategory(AVAudioSession.Category.record, mode: AVAudioSession.Mode.measurement, options: AVAudioSession.CategoryOptions.defaultToSpeaker)
        try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
    } catch {
        print("audioSession properties weren't set because of an error.")
    }

    self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

    let inputNode = audioEngine.inputNode

    guard let recognitionRequest = recognitionRequest else {
        fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
    }

    recognitionRequest.shouldReportPartialResults = true
    self.recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in

        var isFinal = false
        if result != nil {

            self.lblText.text = result?.bestTranscription.formattedString
            print(result?.bestTranscription.formattedString)
            print(result?.isFinal)
            isFinal = (result?.isFinal)!
        }
        
        if error != nil || isFinal {

            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil

            self.btnStart.isEnabled = true
           
        }
    })

    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
        self.recognitionRequest?.append(buffer)
    }
    self.audioEngine.prepare()

    do {
        try self.audioEngine.start()
    } catch {
        print("audioEngine couldn't start because of an error.")
    }
    self.lblText.text = "Say something, I'm listening!"

  }
}
2

There are 2 best solutions below

0
On

As far as I understand you want to stop recognition when user stops speaking. I suggest to use Timer in order to track time spent in silence. Add var detectionTimer: Timer? outside your startRecording(). And inside resultHandler of recognitionTask insert

self.detectionTimer?.invalidate()
self.detectionTimer = Timer.scheduledTimer(withTimeInterval: 2, repeats: false, block: { (timer) in 
    self.stopRecording() 
})

This way after every recognised word you will start timer which will stop recognition if nothing was captured for 2 seconds. stopRecording should look something like this

audioEngine.stop()
recognitionRequest?.endAudio()
recognitionRequest = nil
audioEngine.inputNode.removeTap(onBus: 0)
// Cancel the previous task if it's running
if let recognitionTask = recognitionTask {
  recognitionTask.cancel()
  self.recognitionTask = nil
}
0
On

You can use a timer to achieve this. Start the time as soon as you start playing the audio engine to recognize the speech.

  1. If speech will be recognized continuously the timer will get re started continuosly.
  2. If there will be silence after fixed seconds selector method will get called and stop the recognition.

Below is the code -

func timerReStart() {
        if timer != nil {
            timer?.invalidate()
            timer = nil
        }
        // Change the interval as per the requirement
        timer = Timer.scheduledTimer(timeInterval: 20, target: self, selector: #selector(self.handleTimerValue), userInfo: nil, repeats: false)
    }
    
    @objc func handleTimerValue() {
        cancelRecording()
    }
    
    func timerStop() {
        guard timer != nil else { return }
        timer?.invalidate()
        timer = nil
    }
    
    func startRecording() {
        
        // Clear all previous session data and cancel task
        if recognitionTask != nil {
            recognitionTask?.cancel()
            recognitionTask = nil
        }

        // Create instance of audio session to record voice
        let audioSession = AVAudioSession.sharedInstance()
        do {
            try audioSession.setCategory(AVAudioSession.Category.record, mode: AVAudioSession.Mode.measurement, options: AVAudioSession.CategoryOptions.defaultToSpeaker)
            try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            print("audioSession properties weren't set because of an error.")
        }

        self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

        let inputNode = audioEngine.inputNode

        guard let recognitionRequest = recognitionRequest else {
            fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
        }

        recognitionRequest.shouldReportPartialResults = true
        self.recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in

            var isFinal = false
            if result != nil {

                self.lblText.text = result?.bestTranscription.formattedString
                print(result?.bestTranscription.formattedString)
                print(result?.isFinal)
                isFinal = (result?.isFinal)!
                self.timerReStart()
            }
            
            if error != nil || isFinal {

                self.audioEngine.stop()
                inputNode.removeTap(onBus: 0)

                self.recognitionRequest = nil
                self.recognitionTask = nil
                self.btnStart.isEnabled = true
                self.timerStop()
            }
            
        })

        let recordingFormat = inputNode.outputFormat(forBus: 0)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
            self.recognitionRequest?.append(buffer)
        }
        self.audioEngine.prepare()

        do {
            try self.audioEngine.start()
            //Start timer to check if there is silence
            self.timerReStart()
        } catch {
            print("audioEngine couldn't start because of an error.")
        }
        self.lblText.text = "Say something, I'm listening!"

      }
    }
    
    func cancelRecording() {
       if audioEngine.isRunning {
           let node = audioEngine.inputNode
           node.removeTap(onBus: 0)
           audioEngine.stop()
           recognitionTask?.cancel()
           recognitionTask = nil
       }
       self.timerStop()
   }