I am trying to understand how to use mdptoolbox and had a few questions.
What does 20
mean in the following statement?
P, R = mdptoolbox.example.forest(10, 20, is_sparse=False)
I understand that 10
here denotes the number of possible states. What does 20
mean here? Does it represent the total number of actions per state? I want to restrict the MDP to exactly 2 actions per state. How could I do this?
The shape of P
returned above is (2, 10, 10)
. What does 2
represent here? No matter what values I use for total states and actions, it is always 2
.
The code which you are running is correct, but what you are using is an example from the toolbox.
Please go through the documentation carefully.
In the following code:
P, R = mdptoolbox.example.forest(10, 20, is_sparse=False)
The second argument is not an action-argument for the MDP. Its documentation explains the second argument as follows:
In your case, the value of the reward is passed as
20
when the forest is in the oldest state and the actionWait
is performed.In case of this example, the forest is managed by two actions:
‘Wait’
and‘Cut’
. Please refer this documentation for more details. Since, 2 actions possible, the transition probability matrixP
returned by this function is also having the first dimension size as2
. You do not need to manually restrict the action space dimension to2
.To understand the use of this toolbox, you should also go through this link.