Q-Learning Intermediate Rewards

203 Views Asked by Uzay Macar At 04 December 2018 at 23:10

If a Q-Learning agent actually performs noticeably better against opponents in a specific card game when intermediate rewards are included, would this show a flaw in the algorithm or a flaw in its implementation?

Original Q&A

There are 1 best solutions below

Gracie On 18 January 2019 at 08:55 BEST ANSWER

It's difficult to answer this question without more specific information about the Q-Learning agent. You might term the seeking of immediate rewards as being the exploitation rate, which is generally inversely proportional to the exploration rate. It should be possible to configure this and the learning rate in your implementation. The other important factor is the choice of exploration strategy and you should not have any difficulty in finding resources that will assist in making this choice. For example:

http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/Exploration_QLearning.pdf

https://www.cs.mcgill.ca/~vkules/bandits.pdf

To answer the question directly, it may be either a question of implementation, configuration, agent architecture or learning strategy that leads to immediate exploitation and a fixation on local minima.

Q-Learning Intermediate Rewards

There are 1 best solutions below

Related Questions in REINFORCEMENT-LEARNING

Related Questions in Q-LEARNING

Related Questions in REWARD-SYSTEM

Trending Questions

Popular # Hahtags

Popular Questions