MDP implementation using python - dimensions

475 Views Asked by Nasrin At 27 June 2025 at 14:55

I have problem in implementing mdp (markov decision process) by python.

I have these matrices: states: (1 x n) and actions: (1 x m) .Transition matrix is calculated by this code:

p = np.zeros((n,n))
for t in range(l): # my data is a 1x100 matrix
p[states[t]-1, states[t+1]-1] = p[states[t]-1, states[t+1]-1] + 1
for i in range(n):
p[i,:] = p[i, :] / np.sum(p[i, :])

and Reward matrix by this code:

for i in range(l): 
Reward = (states[i+1]-states[i])/(states[i])*100

To have the optimal value, "quantecon package" in python is defined by:

ddp = quantecon.markov.DiscreteDP(R, Q, beta)

where Q : transition matrix should be m x n x m.

Can anyone help me understand how Q can be a (m,n,m) matirx?! Thank you in advance.

Original Q&A

There are 1 best solutions below

oyamad On 16 January 2019 at 13:54

If you have n states and m actions, Q will be an array of shape (n, m, n) (not (m, n, m)), where you let Q[s, a, t] store the probability that the state in the next period becomes the t-th state when the current state is the s-th state and the action taken is the a-th action.

MDP implementation using python - dimensions

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in TRANSITION

Related Questions in REWARD

Related Questions in MDP

Trending Questions

Popular # Hahtags

Popular Questions