I'm using joint positions from a Kinect camera as my state space but I think it's going to be too large (25 joints x 30 per second) to just feed into SARSA or Qlearning.
Right now I'm using the Kinect Gesture Builder program which uses Supervised Learning to associate user movement to specific gestures. But that requires supervised training which I'd like to move away from. I figure the algorithm might pick up certain associations between joints that I would when I classify the data myself (hands up, step left, step right, for example).
I think feeding that data into a deep neural network and then pass that into a reinforcement learning algorithm might give me a better result.
There was a paper on this recently. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
I know Accord.net has both deep neural networks and RL but has anyone combined them together? Any insights?
If I understand correctly from your question + comment, what you want is to have an agent that performs discrete actions using a visual input (raw pixels from a camera). This looks exactly like what DeepMind guys recently did, extending the paper you mentioned. Have a look at this. It is the newer (and better) version of playing Atari games. They also provide an official implementation, which you can download here. There is even an implementation in Neon which works pretty well.
Finally, if you want to use continuous actions, you might be interested in this very recent paper.
To recap: yes, somebody combined DNN + RL, it works and if you want to use raw camera data to train an agent with RL, this is definitely one way to go :)