Modelling profitability of credit card by Markov Decision Process.

408 Views Asked by At

This is with reference to a paper published on Modelling the profitability of credit cards by Markov Decision processed.I am trying to implement the same in python using Mdptoolbox but not getting the output in the format expected.

My states are the combination of current Risk score and current credit limit of a customer. My actions are to increase the limit of a customer.

I have prepared my transition probabilities for each state.

On running the MDP code using Python MDPtoolbox, I am getting a policy vector which is insufficient for my usage as i need to have the optimal policy for combination of each risk score and credit limit. My current output tells me to increase the limit of a particular risk band to a new limit, which is too generic.

import mdptoolbox
import numpy as np
transitions = np.array([
#Limit1
    [
            [0.2, 0.798, 0.001, 0.001], #s1
            [0.001, 0.1, 0.2, 0.699], #s2
            [0.099, 0.001, 0.8, 0.1], #s3
            [0.001, 0.001, 0.898, 0.1] #s4
    ],
#Limit2
    [
            [0.2, 0.001, 0.001, 0.798], #s1
            [0.001, 0.2, 0.798, 0.001], #s2
            [0.001, 0.4, 0.1, 0.499], #s3
            [0.1, 0.2, 0.001, 0.699] #s4
    ],
#Limit3
    [
            [0.001, 0.1, 0.001, 0.898], #s1
            [0.798, 0.2, 0.001, 0.001], #s2
            [0.001, 0.001, 0.001, 0.997], #s3
            [0.001, 0.2, 0.5, 0.299] #s4
    ],
#Limit4
        [
                [0.2, 0.001, 0.001, 0.798], #s1
                [0.1, 0.001, 0.299, 0.6], #s2
                [0.001, 0.1, 0.001, 0.898], #s3
                [0.001, 0.001, 0.1, 0.898] #s4
        ]
])
rewards = np.array([
        [0, 0, 0.9, 0.1],
        [0, 0.8, 0, 0.2],
        [0.1, 0, 0.7, 0.2],
        [0, 0, 0, 1.0]
        ])
vi = mdptoolbox.mdp.ValueIteration(transitions, rewards,0.995)
vi.run();
vi.policy
vi.V

The final Policy that I am getting is:

vi.policy
Out[86]: (2, 1, 2, 3)

which only says that increase the limit of a customer with risk score S1 to limit 2 and so on. which is too generic. What I am expecting is the matrix policy which tells me how much limit I should increase for each combination of credit risk score and limit.

0

There are 0 best solutions below