Voice to Text conversion in Swift3

4k Views Asked by At

Currently i am working on a project where it has to get user voice as input and it has to be converted to text in real time. I tried many sample projects in JAVA Sphinx, but am struggling with the part of writing a grammar file. So if there is any possible way of doing this in SWIFT3, please help me with it?

3

There are 3 best solutions below

0
On

Now from the iOS10, Apple provides the best solution of this problem for iOS developers. Now you can integrate your app with the SiriKIt, Tutoiral here. Its all Siri's responsibility to manage the text-to-voice recognization, and advantage of using this kit is that,

  1. it is getting more powerfull as the iOS versions are updating and once you integrates this, there is no need to change your code work

  2. your line of code is also less as compare to use third party.

  3. You don't need to manage the kit as you do with third , its all apple duty to manage everything about SiriKit.

2
On

Here is an example of how you can use SFSpeechRecognizer to convert voice to text

First of all import Speech framework in your .swift file.

Then respond to the delegate SFSpeechRecognizerDelegate like this

public class ViewController: UIViewController, SFSpeechRecognizerDelegate {

then declare the below properties

private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!

private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?

private var recognitionTask: SFSpeechRecognitionTask?

private let audioEngine = AVAudioEngine()

After this things, make sure you have access to Speech recognition

Now use this code to convert voice to text

let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)

recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

guard let inputNode = audioEngine.inputNode else { fatalError("Audio engine has no input node") }
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to created a SFSpeechAudioBufferRecognitionRequest object") }

// Configure request so that results are returned before audio recording is finished
recognitionRequest.shouldReportPartialResults = true

// A recognition task represents a speech recognition session.
// We keep a reference to the task so that it can be cancelled.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
    var isFinal = false

    if let result = result {

        //Here the the text of your voice
        print(result.bestTranscription.formattedString)
        isFinal = result.isFinal
    }

    if error != nil || isFinal {
        self.audioEngine.stop()
        inputNode.removeTap(onBus: 0)

        self.recognitionRequest = nil
        self.recognitionTask = nil

    }
}

let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
    self.recognitionRequest?.append(buffer)
}

audioEngine.prepare()

try audioEngine.start()

Disclaimer - The code has been taken from - here

0
On

It seems like you have not done much research over this topic.

Any ways there are many ways you can achieve what you want... Like:

  • Use Speech framework by apple itself. You will get tutorials for speech frame work here and here also you can look at the framework details over here

  • Use OpenEars (its a open source library for speech recognition)

Hope this will help you :)