Do the non-uniform distribution scores reduce Redis Sorted Set performance?

66 Views Asked by At

I wonder whether the non-uniform distribution scores would reduce Redis Sorted Set performance.

There are two random score generation functions, one obeys uniform distribution, and the other obeys a special custom distribution as the following image(the probability of a lower score is much bigger than the probability of a higher score). The two generators will pick random scores from the same range([score_lower_bound, score_upper_bound)).

Custom Scores Distribution

score_lower_bound = 0
score_upper_bound = 10000 # exclude
N = int(1e6)

redis_client = redis.Redis(host='127.0.0.1', port=6379)
f = lambda x: 1.1 ** (-0.01 * x + 100)
weights = [int(f(x)) for x in range(score_lower_bound, score_upper_bound)]

def random_score_uniform_distr():
    return random.randint(score_lower_bound, score_upper_bound-1)

def random_score_custom_distr():
    return random.choices(range(score_lower_bound, score_upper_bound), weights=weights, k=1)[0]

And then, I called the ZADD command repeatedly for N times with a random score generated by the specific generation function and a unique string member/element.

def populate_zset(random_score_func):
    # generate random suffix
    suffix = ''.join(random.choices(string.ascii_lowercase, k=10))
    zset_name = "zset_" + suffix
    print("populating...", zset_name)

    for i in range(N):
        # generate random string as member
        member = ''.join(random.choices(string.ascii_lowercase, k=64))
        ret = redis_client.zadd(zset_name, {member: random_score_func()})
    
    print("populated: ", zset_name, redis_client.zcard(zset_name))

I ran the test many times and collected some results as follows.

populating... zset_zvzqnrujao
populated:  zset_zvzqnrujao 1000000
Time taken to populate zset with uniform distributed scores: 91.79979228973389
populating... zset_ocwlgdohpt
populated:  zset_ocwlgdohpt 1000000
Time taken to populate zset with custom distributed scores: 405.32152819633484

populating... zset_anfqzgrbyu
populated:  zset_anfqzgrbyu 1000000
Time taken to populate zset with uniform distributed scores: 116.31756711006165
populating... zset_oyrjodoasm
populated:  zset_oyrjodoasm 1000000
Time taken to populate zset with custom distributed scores: 473.89297699928284

populating... zset_ezpstuvtmd
populated:  zset_ezpstuvtmd 1000000
Time taken to populate zset with uniform distributed scores: 98.64593005180359
populating... zset_rjndappswl
populated:  zset_rjndappswl 1000000
Time taken to populate zset with custom distributed scores: 428.9520342350006

populating... zset_qnrocvjzec
populated:  zset_qnrocvjzec 1000000
Time taken to populate zset with uniform distributed scores: 104.60574007034302
populating... zset_oyrchapofd
populated:  zset_oyrchapofd 1000000
Time taken to populate zset with custom distributed scores: 434.5851089954376

I think the non-uniform distribution of scores will affect Sorted Set performance because the underlying SkipList degenerates to the normal LinkedList when some tightly adjacent scores(nodes) are accessed more frequently than others, the "next pointer" from the higher layer does not take its advantage. Maybe it is similar to storing a sorted list of numbers in an unbalanced search tree.

0

There are 0 best solutions below