I have the following problem from the Markov Decision Processing book my Martin L Puterman. Which I need help with solving in python.
The problem formulation is as such:
An adult female lion requires about 6 kg of meat per day and has a gut capacity of about 30 kg; which means that, with a full gut, it can go six days without eating. Zebras have an average edible biomass of 164 kg, large enough to meet the food demands of many lions. If a female selects to hunt in a group, the probability of a successful kill per chase has been observed to increase with group size to a point, and that each chase consumes 0.5 kg of energy. Suppose that the probabilities of a successful hunt are given by p(l) = 0.15, p(2) = 0.33, p(3) = 0.37, p(4) = 0.40, p(5) = 0.42, and p ( 2>=6) = 0.43, where p(n) represents thc kill probability per chase for lions hunting in a group of size n. Formulate this as a Markov decision process, in which the state represents the lion’s energy reserves, the action is whether to hunt, and, if so, in what size group. Assume one hunt per day and that lion’s objective is to maximize its probability of survival over T days. Assume that if the hunt is successful, lions in the hunting group share the meat equally.
I've been trying to wrap my head around how to solve it using python.
Currently i have defined my elements as:
S = {0,...,30} -> Representing the lions gut contents (kg). a = {no_hunt, hunt1, hunt2, hunt3, hunt4, hunt5, hunt6}. I have excluded the hunt > 6 as that would yield less meal with equal probability of sucess.
I was thinking I would try to implement it in python to simulate over T days.
# Constants for the lion's parameters
lion_capacity = 30 # Maximum stomach content
lion_meat_per_day = 6 # Meat required per day
lion_hunt_cost = 0.5 # Meat cost for each hunt
zebra_meat_per_group = 164 # Meat obtained from a successful hunt
# Hunt success probabilities by group size
hunt_success_probabilities = {
1: 0.15,
2: 0.33,
3: 0.37,
4: 0.4,
5: 0.42,
6: 0.43
}
# Number of days to plan for
T = 50
# Initialize state (lion's stomach content) and cumulative reward
state = lion_capacity
cumulative_reward = 0
# Initialize action policy
action_policy = []
# Simulate T days
for day in range(T):
# Calculate the expected meat yield for each group size
expected_rewards = []
for group_size in range(1, 7):
hunt_prob = hunt_success_probabilities[group_size]
expected_meat_yield = zebra_meat_per_group / group_size
adjusted_expected_meat_yield = expected_meat_yield - lion_hunt_cost
expected_reward = hunt_prob * adjusted_expected_meat_yield
expected_rewards.append(expected_reward)
# Choose the action that maximizes the expected meat yield while conserving energy
if lion_capacity >= state - lion_meat_per_day - lion_hunt_cost:
# If stomach content allows hunting, choose the group size that maximizes expected meat yield
action = np.argmax(expected_rewards) + 1
for i in enumerate(expected_rewards):
print(i)
else:
# If stomach content is insufficient for hunting, wait
action = 0
# Calculate the reward and update the state based on the chosen action
if action == 0: # Wait
state -= lion_meat_per_day
reward = 0
else:
group_size = action
hunt_prob = hunt_success_probabilities[group_size]
# Determine if the hunt is successful
if np.random.rand() < hunt_prob:
reward = zebra_meat_per_group / group_size
state += reward - lion_meat_per_day - lion_hunt_cost
else:
reward = -0.5 # Negative reward for unsuccessful hunt
state -= lion_meat_per_day - lion_hunt_cost
cumulative_reward += reward
# Update action policy
action_policy.append(action)
# Calculate the probability of survival
prob_survival = state / lion_capacity
print(f"Total Reward: {cumulative_reward}")
print(f"Probability of Survival: {prob_survival}")
print(f"Action Policy: {action_policy}")
But my results are just always chosing to hunt in a group of 2 or 0. Furthermore I have absolutely no idea of whether or not my thought process was correct or if it is infact the correct result.