How to implement a finite horizon MDP in python?

69 Views Asked by SNAPSEHAMZ At 06 October 2023 at 22:20

I have the following problem from the Markov Decision Processing book my Martin L Puterman. Which I need help with solving in python.

The problem formulation is as such:

An adult female lion requires about 6 kg of meat per day and has a gut capacity of about 30 kg; which means that, with a full gut, it can go six days without eating. Zebras have an average edible biomass of 164 kg, large enough to meet the food demands of many lions. If a female selects to hunt in a group, the probability of a successful kill per chase has been observed to increase with group size to a point, and that each chase consumes 0.5 kg of energy. Suppose that the probabilities of a successful hunt are given by p(l) = 0.15, p(2) = 0.33, p(3) = 0.37, p(4) = 0.40, p(5) = 0.42, and p ( 2>=6) = 0.43, where p(n) represents thc kill probability per chase for lions hunting in a group of size n. Formulate this as a Markov decision process, in which the state represents the lion’s energy reserves, the action is whether to hunt, and, if so, in what size group. Assume one hunt per day and that lion’s objective is to maximize its probability of survival over T days. Assume that if the hunt is successful, lions in the hunting group share the meat equally.

I've been trying to wrap my head around how to solve it using python.

Currently i have defined my elements as:

S = {0,...,30} -> Representing the lions gut contents (kg). a = {no_hunt, hunt1, hunt2, hunt3, hunt4, hunt5, hunt6}. I have excluded the hunt > 6 as that would yield less meal with equal probability of sucess.

I was thinking I would try to implement it in python to simulate over T days.


# Constants for the lion's parameters
lion_capacity = 30  # Maximum stomach content
lion_meat_per_day = 6  # Meat required per day
lion_hunt_cost = 0.5  # Meat cost for each hunt
zebra_meat_per_group = 164  # Meat obtained from a successful hunt

# Hunt success probabilities by group size
hunt_success_probabilities = {
    1: 0.15,
    2: 0.33,
    3: 0.37,
    4: 0.4,
    5: 0.42,
    6: 0.43
}

# Number of days to plan for
T = 50

# Initialize state (lion's stomach content) and cumulative reward
state = lion_capacity
cumulative_reward = 0

# Initialize action policy
action_policy = []

# Simulate T days
for day in range(T):
    # Calculate the expected meat yield for each group size
    expected_rewards = []
    for group_size in range(1, 7):
        hunt_prob = hunt_success_probabilities[group_size]
        expected_meat_yield = zebra_meat_per_group / group_size
        adjusted_expected_meat_yield = expected_meat_yield - lion_hunt_cost
        expected_reward = hunt_prob * adjusted_expected_meat_yield
        expected_rewards.append(expected_reward)
    
    # Choose the action that maximizes the expected meat yield while conserving energy
    if lion_capacity >= state - lion_meat_per_day - lion_hunt_cost:
        # If stomach content allows hunting, choose the group size that maximizes expected meat yield
        action = np.argmax(expected_rewards) + 1
        for i in enumerate(expected_rewards):
            print(i)
    else:
        # If stomach content is insufficient for hunting, wait
        action = 0

    # Calculate the reward and update the state based on the chosen action
    if action == 0:  # Wait
        state -= lion_meat_per_day
        reward = 0
    else:
        group_size = action
        hunt_prob = hunt_success_probabilities[group_size]
        # Determine if the hunt is successful
        if np.random.rand() < hunt_prob:
            reward = zebra_meat_per_group / group_size
            state += reward - lion_meat_per_day - lion_hunt_cost
        else:
            reward = -0.5  # Negative reward for unsuccessful hunt
            state -= lion_meat_per_day - lion_hunt_cost

    cumulative_reward += reward

    # Update action policy
    action_policy.append(action)

# Calculate the probability of survival
prob_survival = state / lion_capacity

print(f"Total Reward: {cumulative_reward}")
print(f"Probability of Survival: {prob_survival}")
print(f"Action Policy: {action_policy}")

But my results are just always chosing to hunt in a group of 2 or 0. Furthermore I have absolutely no idea of whether or not my thought process was correct or if it is infact the correct result.

Original Q&A

How to implement a finite horizon MDP in python?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in MARKOV

Related Questions in MARKOV-MODELS

Related Questions in MARKOV-DECISION-PROCESS

Trending Questions

Popular # Hahtags

Popular Questions