I am new to python and trying to create a bloomFilter based on Bit torrent BEP 33. I have created Bloom Filter but it is not exactly what I am looking for. Here's what I need and I have not completely understood this situation. If someone here can explain ...
//fixed parameters
k = 2
m = 256*8
//the filter
byte[m/8] bloom ## What is this part?
function insertIP(byte[] ip) {
byte[20] hash = sha1(ip)
int index1 = hash[0] | hash[1] << 8
int index2 = hash[2] | hash[3] << 8
// truncate index to m (11 bits required)
index1 %= m ## ?
index2 %= m ## ?
// set bits at index1 and index2
bloom[index1 / 8] |= 0x01 << index1 % 8 ## ??
bloom[index2 / 8] |= 0x01 << index2 % 8 ## ??
}
// insert IP 192.168.1.1 into the filter:
insertIP(byte[4] {192,168,1,1})
And this is what I have created
import hashlib
m = 2048
def hashes(s):
index = [0, 0]
#for c in s:
#o = ord(c)
index[0] = hashlib.sha224(index[0]).hexdigest ## This needs integer hash
index[1] = hashlib.sha224(index[1]).hexdigest ## same as above
return [x % m for x in index]
class BloomFilter(object):
def __init__(self):
self.bitarray = [0] * m
def add(self, s):
for x in hashes(s):
self.bitarray[x] = 1
#print self.bitarray
def query(self, s):
return all(self.bitarray[x] == 1 for x in hashes(s))
shazib=BloomFilter()
shazib.add('192.168.0.1')
print shazib.query('192.168.0.1')
First, an explanation of the code...
This is the most baffling line to me;
k
isn't used at all...This is the number of bits in 256 bytes.
bloom
is an array of 256 bytes, i.e. 256 * 8 bits, i.e.m
bits. Each bit inbloom
will contain information about what values are in the filter.This creates a 20-byte hash of
ip
.These two lines calculate two indices into
bloom
based on the hash. Basically,index1
is the concatenation of the first two bytes ofhash
, andindex2
is the concatenation of the second two bytes ofhash
.These two lines truncate the values so that they don't exceed the range of possible indices into
bloom
. The%
is the mod operator; it returns the remainder after division. (17 % 4 = 1, 22 % 5 = 2 and so on.) Remember that bloom is 256 * 8 bits long? Eleven bits allows us to encode 2 ** 11 possible indices, i.e. 2048 values, i.e. 256 * 8 values.We're treating
bloom
as a bit-array, so we have to do some bit-twiddling to access the correct bit. First, divideindexA
by 8, to get the correct byte, then truncateindexA
using the%
operator to get the correct bit within that byte.And voila, we have a bloom filter. If you printed it out bitwise, it would look like this:
And if a particular i.p., when hashed, generates an
index1
of5
and anindex2
of9
, then it would be considered "in" the filter, because the bits at indices5
and9
are set to1
. Of course, there can be false positives, because multiple different values could result in the same indices; but there can be no false negatives.Here's your first problem.
index[0]
andindex[1]
need to be integers. Also,hashlib.sha224(index[0]).hexdigest
returns a method. You have to call the method to get anything out of it, like this:hashlib.sha224(index[0]).hexdigest()
. Also, if you want this to work in a way that's identical to the above code, you could convert the hash to an int (you can useint(x, 16)
to convert a hexadecimal string into an integer) and then extract the first two bytes using& 65535
, then shift it by two bytes using>> 16
, then extract those two bytes using& 65535
again. Once you've got that correct, the rest works.