Why does the hash() function in python take constant time to operate on strings of variable length?

523 Views Asked by Anirban Chakraborty At 02 September 2020 at 06:14

From the run time of programs including the hash() operation on variable length strings I felt that it might be that the hash() function takes constant time to operate on different length strings. To verify my assumption I made the following strategy -

Create strings of length k
Hash the string and record the hash() operation time t
This is repeated for k varying from 0 to 100,00 and a plot of string length k vs time t is generated.

Hence if my guess that the hash() function is a constant time operation when operating on strings is correct, can you please explain in layman terms why is it so? A conceptual or theoretical explanation rather than a reference to a source-code would be preferable- as to how even large strings produce a hash instantaneously, indifferent of character length.

The following is the code implementation of above-mentioned strategy -

import random
import time
import pylab

def form_str(k, n):
    """
        Form k-length strings of arbitrary characters, n in number.
    """
    for i in range(n):
        random_str = ''
        for j in range(k):
            nextchar = random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz')
            random_str += nextchar
        yield random_str

def time_hash(k, n):
    """
        Find the total time to hash n strings of k length each.
    """
    
    total_time = 0.0
    for element in form_str(k, n):
        start_time = time.time()
        # Because hash() works almost instantaneously we repeat
        # it 100,000 times to get a non-zero run time.
        for i in range(100000):
            hash(element)
        end_time = time.time()
        total_time += (end_time - start_time)
    return round(total_time, 2)

# print(time_hash(100, 100))  

def plot_time():
    """
        Plots the time vs string length (k) over a range of k-values.
    """
    x_values, y_values = [], []
    for k in range(0, 100000, 5000):
        x_values.append(k)
        y_values.append(time_hash(k, 100))
    # print(y_values)
    pylab.figure(1)
    pylab.title('Hash_Time_Complexity')
    pylab.xlabel('Length of string')
    pylab.ylabel('Time taken to hash string (100 x 100000 times)')
    pylab.plot(x_values, y_values, 'r:', label = 'Hash time')
    pylab.show()

plot_time()
# From the plot it is clear that indifferent of the string length
# hashing takes the same time.

The following is the generated plot -

Original Q&A

There are 1 best solutions below

Anmol Singh Jaggi On 02 September 2020 at 06:21 BEST ANSWER

Since strings are immutable, the hashcode of a string is computed only once and cached thereafter.

A better way to benchmark would be to generate different(unique) strings of length k and average their hash time, instead of calling hash of the same string multiple times.

Why does the hash() function in python take constant time to operate on strings of variable length?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in HASH

Related Questions in TIME-COMPLEXITY

Related Questions in HASH-FUNCTION

Related Questions in CONSTANT-TIME

Trending Questions

Popular # Hahtags

Popular Questions