Monte Carlo simulation of Birthday paradox in python 3

901 Views Asked by At

The birthday paradox is that everyone has equal probability of having a birthday on any given of 365 days. We start adding people in a room. What is the probability that 2 people have birthdays on same day as a function of number of people in the room? The code I wrote is as follows:

import numpy as np
import matplotlib.pyplot as plt

x=[0]
y=[0]
for j in range(1000):
    if j!=0:
        freq = []
        L1 = list(np.random.randint(low = 1, high=366, size = j))
        result = list((i, L1.count(i)) for i in L1)
        for a_tuple in result:
            freq.append(a_tuple[1])
            print(freq)
        rep = j - freq.count(1)
        prob = rep/j
        y = y + [prob]
        x = x + [j]
print(prob)
plt.plot(x,y)

Here, in L1 = list(np.random.randint(low = 1, high=366, size = j)) I select the day on which someone would have a birthday and in result = list((i, L1.count(i)) for i in L1) I calculate the frequency of birthdays on each day. The entire thing is looped over to account for increasing number of people.

In the following for loop, I isolate the unique events and find repetitions and store the value in rep. Next I calculated the probability as fraction of people sharing birthdays and plotted them as a function of number. However, the question requires me to find the probability of just one shared birthday. How do I calculate that? I think I have to loop this entire thing for number of trials but that just gives an accurate solution with less variations of the same program. Currently my program gives fraction of people having shared birthdays I think.

Birthday problem Wikipedia for better reference

2

There are 2 best solutions below

0
On

NOTE

I assume that when n persons have been in the room, they are all thrown out of the room and then n+1 persons enter the room.

========================================

I would think of it this way;

First, set probs = [0]*365. Now, say 2 persons get in the room - we then write their birthdays onto a piece of paper and check, if those two dates are equal. If they are, we increase probs[2] by 1 (yes, theres some indexes that we don't need, and Python is 0-indexed etc. but to keep it simple).

Now do the same for 3 persons, for 4 persons, for 5 persons ... all the way up to 365. Your array might look something like probs==[0,0,0,0,0,1,0,1,1,0,1,1,1,1,0,1....].

You can now start over from 2 persons (still keeping the same array as before i.e don't create a new one with 0's!), then 3 persons etc. and start over 1000 times. Your array might look like

probs==[0,0,2,0,4,1,5,2,9,12,10,17....,967,998]

If you divide that array by 1000 (elementwise) you now have your simulated probability as a function of n persons.

import numpy as np
import matplotlib.pyplot as plt

N_TOTAL_PERS= 366
N_SIM = 10000 #number of simulations
counts = np.zeros(N_TOTAL_PERS)

for _ in range(N_SIM): 
    for n in range(2,N_TOTAL_PERS):
        b_days = np.random.randint(1,366,size=n) #Get each persons birth-day
        counts [n] += len(b_days) != len(set(b_days)) #Increment if some birthdays are equal    


total_probs = counts/N_SIM #convert to probabilities
total_probs[70] #Get the probability when 70 persons are together (0.9988)
plt.plot(range(N_TOTAL_PERS),total_probs)

which generates a plot that looks like enter image description here

0
On

You should run multiple experiments for different number of people in the room. Note that for N_people > 365, the probability should compute equal to 1.

Refactoring your code, and changing the logic a bit, I came up with the following:

import numpy as np
import matplotlib.pyplot as plt


def random_birthdays(n_people):
    return list(np.random.randint(low=1, high=366, size=n_people))

def check_random_room(n_people):
    """
    Generates a random sample of `n_people` and checks if at least two of them
    have the same birthday
    """
    birthdays = random_birthdays(n_people)
    return len(birthdays) != len(set(birthdays))

def estimate_probability(n_people, n_experiments):
    results = [check_random_room(n_people) for _ in range(n_experiments)]

    return sum(results)/n_experiments


N_EXPERIMENTS = 1000
x = list(range(1, 400))

y = [estimate_probability(x_i, N_EXPERIMENTS) for x_i in x]

plt.plot(x, y)


plt.show()