Why the bandit problem is also called a one-step/state MDP in Reinforcement learning?

781 Views Asked by vaibhav At 18 August 2025 at 02:06

What do we mean by 1 step/state MDP(Markov decision process) ?

There are 2 best solutions below

Sanit On 11 February 2020 at 20:08 BEST ANSWER

Let us consider a n action 1 state MDP. Regardless of which action you take, you are going to stay in the same state. You will, though, get a reward that depends only on the action you took. If you wish to maximise the long term reward in this setting, what you need to do is just judge which of n available choices (actions) is the best.

This is exactly what the bandit problem is.

Mochan On 11 February 2020 at 14:20

In bandit the past pulls of levers do not affect what the lever will output or the reward.

The reward is only dependent on which lever is pulled and nothing in the past.

So there is only one state.

Why the bandit problem is also called a one-step/state MDP in Reinforcement learning?

There are 2 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in REINFORCEMENT-LEARNING

Related Questions in MARKOV-DECISION-PROCESS

Related Questions in MDP

Related Questions in BANDIT

Trending Questions

Popular # Hahtags

Popular Questions