I wonder whether the non-uniform distribution scores would reduce Redis Sorted Set performance.
There are two random score generation functions, one obeys uniform distribution, and the other obeys a special custom distribution as the following image(the probability of a lower score is much bigger than the probability of a higher score). The two generators will pick random scores from the same range([score_lower_bound, score_upper_bound)).
score_lower_bound = 0
score_upper_bound = 10000 # exclude
N = int(1e6)
redis_client = redis.Redis(host='127.0.0.1', port=6379)
f = lambda x: 1.1 ** (-0.01 * x + 100)
weights = [int(f(x)) for x in range(score_lower_bound, score_upper_bound)]
def random_score_uniform_distr():
return random.randint(score_lower_bound, score_upper_bound-1)
def random_score_custom_distr():
return random.choices(range(score_lower_bound, score_upper_bound), weights=weights, k=1)[0]
And then, I called the ZADD command repeatedly for N times with a random score generated by the specific generation function and a unique string member/element.
def populate_zset(random_score_func):
# generate random suffix
suffix = ''.join(random.choices(string.ascii_lowercase, k=10))
zset_name = "zset_" + suffix
print("populating...", zset_name)
for i in range(N):
# generate random string as member
member = ''.join(random.choices(string.ascii_lowercase, k=64))
ret = redis_client.zadd(zset_name, {member: random_score_func()})
print("populated: ", zset_name, redis_client.zcard(zset_name))
I ran the test many times and collected some results as follows.
populating... zset_zvzqnrujao
populated: zset_zvzqnrujao 1000000
Time taken to populate zset with uniform distributed scores: 91.79979228973389
populating... zset_ocwlgdohpt
populated: zset_ocwlgdohpt 1000000
Time taken to populate zset with custom distributed scores: 405.32152819633484
populating... zset_anfqzgrbyu
populated: zset_anfqzgrbyu 1000000
Time taken to populate zset with uniform distributed scores: 116.31756711006165
populating... zset_oyrjodoasm
populated: zset_oyrjodoasm 1000000
Time taken to populate zset with custom distributed scores: 473.89297699928284
populating... zset_ezpstuvtmd
populated: zset_ezpstuvtmd 1000000
Time taken to populate zset with uniform distributed scores: 98.64593005180359
populating... zset_rjndappswl
populated: zset_rjndappswl 1000000
Time taken to populate zset with custom distributed scores: 428.9520342350006
populating... zset_qnrocvjzec
populated: zset_qnrocvjzec 1000000
Time taken to populate zset with uniform distributed scores: 104.60574007034302
populating... zset_oyrchapofd
populated: zset_oyrchapofd 1000000
Time taken to populate zset with custom distributed scores: 434.5851089954376
I think the non-uniform distribution of scores will affect Sorted Set performance because the underlying SkipList degenerates to the normal LinkedList when some tightly adjacent scores(nodes) are accessed more frequently than others, the "next pointer" from the higher layer does not take its advantage. Maybe it is similar to storing a sorted list of numbers in an unbalanced search tree.
