I am trying to generate random base32 numbers that are 6 characters or less. This should give approximately 1 billion different combinations.
I have created a program to generate these “random” numbers. However, it appears that it generates a duplicate on average every 40,000 generations.
Why are these “random” numbers duplicating so often when there are over a billion different combinations?
Here is my code:
static void Main(string[] args)
{
int seed = Environment.TickCount;
Random r = new Random(seed);
Dictionary<int, int> resultDictionary = new Dictionary<int, int>();
for (int x = 1; x <= 1000; x++)
{
Dictionary<string, int> dictionary = new Dictionary<string, int>();
try
{
while (true)
{
int rand = r.Next(0, 1073741823);
CrockfordBase32Encoding encoding = new CrockfordBase32Encoding();
string encodedRand = encoding.Encode((ulong)rand, false);
dictionary.Add(encodedRand, rand);
}
}
catch (Exception)
{
}
Console.WriteLine(string.Format("{0} - {1}", x, dictionary.Count));
resultDictionary.Add(x, dictionary.Count);
x++;
}
Console.WriteLine();
Console.WriteLine("Average Number Before Duplicate: " + resultDictionary.Average(x => x.Value));
Console.ReadLine();
}
This is similar to the Birthday Problem. Given a group of
n
people, What is the probability that two share the same birthday1? It's higher than you'd think.In your case, what are the odds that randomly picking a number between 0 and 1,073,741,823 n times will give you a duplicate?
One approximation from the link above is
1-exp(-(n*n)/(2*d))
. Ifn=40,000
that equates to about a 52.5% probability that a duplicate is chosen, so seeing duplicates after 40,000 picks on average seems reasonable.1assuming that birthdays are uniformly distributed universally, which is not the case in reality but is "close enough" and makes the math easier