I want to split my values associating them with hash between buckets.
As HASH I am using embedded hash
Python function which range is -abs(sys.maxsize)
to sys.maxsize
.
I have created a function to list buckets by values. I don't understand why I don't get the exact same values by splitting them. I have returned the values with non-scientific notations. Is it the issue? Do you have a better solution to perform this?
I do all of this to bucket with pandas after that: https://www.datasciencemadesimple.com/binning-or-bucketing-of-column-in-pandas-python-2/
import sys
#default values (min & max are hash table values for hash())
min = -abs(sys.maxsize)
max = sys.maxsize
n_buckets = 5
def get_buckets(min,max,n_buckets):
# get the length for bucketing
hash_lenght=abs(min - max)
# return the length for each bucket size
bucket_size = hash_lenght // n_buckets
i = 0
buckets = [min]
# for each bucket i retrieve the begin value (the end value for last bucket)
while i < n_buckets:
i = i+1
last_elem = buckets[-1]
new_size = last_elem + bucket_size
buckets.append(new_size)
print(buckets)
print(i)
return bucket_size
print(get_buckets(min,max,n_buckets))