MySQL RAND() seed values almost repeat

3.2k Views Asked by At

Using MySQL 5.6.21 on Windows 7.

I am attempting to return a 'random' row from a table seeded by the date (so the same row returns for the current day and then switches the next day etc - a "random quote of the day generator" if you like).

I noticed the same rows keep coming up so I simplified the query to its basics, it appears the RAND() function generates very similar numbers every fourth seed value. When rounded to an integer the values appear to repeat every fourth seed. This example only uses 16 rows, but you get the idea.

create table t (i INT);

insert into t values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15);

select i, ceil(rand(i) * 16), rand(i) from t;

drop table t;

Gives...

0   3   0.15522042769493574
1   7   0.40540353712197724
2   11  0.6555866465490187
3   15  0.9057697559760601
4   3   0.15595286540310166
5   7   0.40613597483014313
6   11  0.6563190842571847
7   15  0.9065021936842261
8   3   0.15668530311126755
9   7   0.406868412538309
10  11  0.6570515219653505
11  15  0.907234631392392
12  3   0.15741774081943347
13  7   0.40760085024647497
14  11  0.6577839596735164
15  15  0.9079670691005579

Not what I expected, so what am I doing wrong? I expected a pseudo-random sequence to be generated.

2

There are 2 best solutions below

3
On

According to documentation RAND(n) is working properly only if n is constant. The effect of using a nonconstant argument is undefined. As of MySQL 5.0.13, nonconstant arguments are not permitted.

And as they say RAND() is not meant to be a perfect random generator. It is a fast way to generate random numbers on demand that is portable between platforms for the same MySQL version.

MySQL documentation

Check what will happen if you use RAND() without parameter.

0
On

RAND() is not meant to be a perfect random generator, so if it is not suitable, then other ways of generating random numbers should be used. In my case, I needed to order rows differently per day, and RAND(CURDATE()+0) proved to be a viable solution.

Yes, it does seem as though the first term has a cycle of 4, and it will be given a similar random number each time the seed increments by 4.

I got the following results from my own basic testing of random number generation and incrementing the seed by 4 each time, using the seed number 0, 4, 8, and 12.

  • Row 1: Incremented by about 0.001
  • Row 2: Incremented by about 0.005
  • Row 3: Incremented by about 0.02
  • Row 4: Incremented by about 0.05
  • Row 5: Incremented by about 0.2
  • Row 6: Decremented by about 0.01 (or maybe it incremented by 0.99)
  • Row 7: Incremented by about 0.26

In OP's example, because they don't use a constant seed number, they technically have a sample size of 1. The impact with recurring indexes is lessened as the row count increases, and there is a higher chance that other items will appear before or after other items when they wouldn't have previously.

A couple of caveats I've worked out from using RAND():

If RAND() is only used as a SELECT column or WHERE condition, and a LIMIT is specified, then RAND() will only be generated once for each of the returned rows.

CREATE TABLE t (i INT);
INSERT INTO t VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15);
SELECT i, RAND(0) FROM t LIMIT 0, 1; # 0, 0.15522042769493574
SELECT i, RAND(0) FROM t LIMIT 1, 1; # 1, 0.15522042769493574

If RAND() is used in the ORDER BY statement, then RAND() will be calculated for all matching rows regardless of any LIMIT.

CREATE TABLE t (i INT);
INSERT INTO t VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15);
SELECT i, RAND(0) FROM t ORDER BY RAND(0) LIMIT 0, 1; # 0, 0.15522042769493574
SELECT i, RAND(0) FROM t ORDER BY RAND(0) LIMIT 1, 1; # 6, 0.2964166321758336