Solving a Discrete Cake Eating Problem with the MDPToolbox in R: why is the policy function is showing we eat more cake than that which is present?

58 Views Asked by EconJohn At 27 June 2025 at 20:27

I am currently learning about markov decision processes and have been using the MDPToolbox package in R.

I am practicing coding a "discrete cake eating problem" where we are analyzing how a consumer should eat a cake made up of 3 slices. This is a simple problem because there is no stochastic elements in this problem.

The R code I have is the following:

#Cake Eating with the MDPTools Package
require(MDPtoolbox)

#Transition Matrices from eating 0,1,2 and 3 pieces of cake

C_0<-matrix(c(1,0,0,0,
              1,0,0,0,
              1,0,0,0,
              1,0,0,0),
            ncol=4, nrow=4,
            byrow=TRUE)

C_1<-matrix(c(0,1,0,0,
              0,1,0,0,
              0,1,0,0,
              0,1,0,0),
            ncol=4, nrow=4,
            byrow=TRUE)


C_2<-matrix(c(0,0,1,0,
              0,0,1,0,
              0,0,1,0,
              0,0,1,0),
            ncol=4, nrow=4,
            byrow=TRUE)


C_3<-matrix(c(0,0,0,1,
              0,0,0,1,
              0,0,0,1,
              0,0,0,1),
            ncol=4, nrow=4,
            byrow=TRUE)

T<-list(C_0=C_0,C_1=C_1,C_2=C_2,C_3=C_3)

#Utilities from eating 0, 1,2 and 3 pieces of cake.

U_func<-function(x){x^0.5}

U_C0<-U_func(0)
U_C1<-U_func(1)
U_C2<-U_func(2)
U_C3<-U_func(3)

#Reward Matrix
W<-matrix(c(U_C0,U_C0,U_C0,U_C0,
            U_C1,U_C1,U_C1,U_C0,
            U_C2,U_C2,U_C0,U_C0,
            U_C3,U_C0,U_C0,U_C0),
          ncol=4, nrow=4,byrow=TRUE)

#MDP Check
mdp_check(T,W)

#MDP policy iteration
m1<-mdp_policy_iteration(T,W,0.8)

m1
names(T)[m1$policy]

The output from this code is:

> m1
$V
[1] 4.920475 5.920475 6.150593 5.668430

$policy
[1] 3 3 2 1

$iter
[1] 4

$time
Time difference of 0.03619385 secs

> names(T)[m1$policy]
[1] "C_2" "C_2" "C_1" "C_0"

This is peculiar because The optimal policy suggests eating two pieces of cake in the first two periods and one piece of cake in the third period and zero in the fourth period.

The problem is that the cake size in this problem is intended to consist of only 3 pieces.

My Question: Why does the optimal policy suggest eating 5 pieces of cake when there are only 3 pieces in this problem? (where is my code broken?)

Original Q&A

Solving a Discrete Cake Eating Problem with the MDPToolbox in R: why is the policy function is showing we eat more cake than that which is present?

There are 0 best solutions below

Related Questions in R

Related Questions in MARKOV-DECISION-PROCESS

Related Questions in MDPTOOLBOX

Trending Questions

Popular # Hahtags

Popular Questions