I just did an interesting test of running a speech recogniser service and using NSSpeechSynthesis to echo what I said using NSSpeechSynthesizer.
However, NSSpeechSynthesizer is notorious for being slow and unresponsive, and I wanted to know if anyone has tried optimising this by specifying either a core, a thread or the GPU (using metal) to process both recognition and synthesis.
I've been checking the following article to understand better pipelining values through the metal buffer: http://memkite.com/blog/2014/12/30/example-of-sharing-memory-between-gpu-and-cpu-with-swift-and-metal-for-ios8/
The author has used Metal for off loading the sigmoid function used in ML which makes complete sense as vector maths is what GPUs do best.
However, I would like to know if anyone has explored the possibility of sending other type of data, floats values from a wave form or other (render synthesis through the GPU).
Particularly, has anyone tried this for NSSpeechRecogniser or NSSpeechSynthesizer?
As it goes now, I have a full 3D scene with 3D HRTF sound, and both recognition and synthesis work but sometimes there's a noticeable lag, so maybe dedicating a buffer pipeline through the GPU MTLDevice then back to play the file might work?