How to seed srand() to avoid collision on a large number of machines?

974 Views Asked by At

Typically, the seeding of srand() is done by:

srand(time(NULL));

In my case, I use random numbers to generate an identifier for my client process at runtime on the network. The process sometimes restarts and generates a new identifier. As the number of clients increases, there's a good chance that two clients call srand(time(NULL)) within the same sec, which creates two identical identifiers, or a collision as seen by the server side. Some people suggested a finer resolution:

srand((time.tv_sec * 1000) + (time.tv_usec / 1000));

But The trouble here is that the seed will repeat every 24 days or so, and when the number of machines is large enough, there's still a chance of collision. There's another solution:

srand(time.tv_usec * time.tv_sec);

But this seems problematic to me too because the the modulus of this product (the higher bits overflow and get abandoned) is not evenly distributed within the range of the unsigned int seed value. For example, for every sec, time.tv_usec == 0 leads to the same seed.

So is there a way to seed srand() in my case?

Edit: the client runs on Linux, Windows, Android and iOS, so /dev/random or /dev/urandom isn't always available.

P.S. I'm aware of the GUID/UUID approach, but I'd like to know if it's possible to just seed srand() properly in this case.

5

There are 5 best solutions below

0
On BEST ANSWER

You have two domains: clients and processes. Therefore you need a unique identifier for each one. Processes obviously can be done with the process-id. For clients, I suggest using the MAC address, which MUST be unique for each network interface. I believe that all the platforms you list support sockets, so the SIOCGIFHWADDR ioctl may be supported.

The only problem is that MAC addresses are 48 bits and PIDs are typically 32 bits, so you have to pick the highest entropy bits of the two values to use for your srand seed. I suggest the lower 16 bits of the PID and the lower 16 bits of the MAC address.

1
On

If you are writing for linux platform, use /dev/random and /dev/urandom to get random numbers. Unlike srand(), /dev/random and /dev/urandom uses noise generated from hardware peripherals to generate random numbers. srand() uses seed supplied as argument to generate random number.

0
On

srand and rand are simply not appropriate if you have a lot of processes or threads that need to have pseudo-randomness that is independent between them.

On POSIX systems you can use the rand48 family of functions like jrand48 that have known state size. If you depend on process, thread and machine independence you should use significant bits from the process ID, thread ID, IP address and time to initialize the state. jrand[48] takes (and modifies) a state of three short, so it should be relatively simple to seed these with the different quanties.

All but one of the systems that you list are POSIX, so this should work there. What would be an adequate fallback for Windows systems, I don't know.

0
On

If two random number generators never have collisions, they're not random. It would be like playing 'snap' but never getting a match.

So, what you want is a worse, rather than a better random number generator. Using GUIDs is indeed one approach that should entirely remove the problem for you.

But, If you're happy just reducing the chance of collisions, rather than eliminating them entirely you could use the machine's IP address (or processor serial number, or somesuch) as part of the seed.

1
On

I think you'll be interested in the paper "Parallel Random Numbers: As Easy as 1, 2, 3".

Quoting from the abstract: "All our PRNGs pass rigorous statistical tests (including TestU01’s BigCrush) and produce at least 264 unique parallel streams of random numbers, each with period 2128 or more"