Is Monte Carlo learning policy or value iteration (or something else)?

971 Views Asked by At

I am taking a Reinforcement Learning class and I didn’t understand how to combine the concepts of policy iteration/value iteration with Monte Carlo (and also TD/SARSA/Q-learning). In the table below, how can the empty cells be filled: Should/can it be binary yes/no, some string description or is it more complicated?

enter image description here

1

There are 1 best solutions below

0
On

Value iteration and policy iteration are model-based methods of finding an optimal policy. They try to construct the Markov decision process (MDP) of the environment. The main premise behind reinforcement learning is that you don't need the MDP of an environment to find an optimal policy, and traditionally value iteration and policy iteration are not considered RL (although understanding them is key to RL concepts). Value iteration and policy iteration learn "indirectly" because they form a model of the environment and can then extract the optimal policy from that model.

"Direct" learning methods do not attempt to construct a model of the environment. They might search for an optimal policy in the policy space or utilize value function-based (a.k.a. "value based") learning methods. Most approaches you'll learn about these days tend to be value function-based.

Within value function-based methods, there are two primary types of reinforcement learning methods:

  • Policy iteration-based methods
  • Value iteration-based methods

Your homework is asking you, for each of those RL methods, if they are based on policy iteration or value iteration.

A hint: one of those five RL methods is not like the others.