Use GPU for speech NSSpeechSynthesis and NSSpeechRecogniers on OS X

102 Views Asked by At

I just did an interesting test of running a speech recogniser service and using NSSpeechSynthesis to echo what I said using NSSpeechSynthesizer.

However, NSSpeechSynthesizer is notorious for being slow and unresponsive, and I wanted to know if anyone has tried optimising this by specifying either a core, a thread or the GPU (using metal) to process both recognition and synthesis.

I've been checking the following article to understand better pipelining values through the metal buffer: http://memkite.com/blog/2014/12/30/example-of-sharing-memory-between-gpu-and-cpu-with-swift-and-metal-for-ios8/

The author has used Metal for off loading the sigmoid function used in ML which makes complete sense as vector maths is what GPUs do best.

However, I would like to know if anyone has explored the possibility of sending other type of data, floats values from a wave form or other (render synthesis through the GPU).

Particularly, has anyone tried this for NSSpeechRecogniser or NSSpeechSynthesizer?

As it goes now, I have a full 3D scene with 3D HRTF sound, and both recognition and synthesis work but sometimes there's a noticeable lag, so maybe dedicating a buffer pipeline through the GPU MTLDevice then back to play the file might work?

0

There are 0 best solutions below