I have the following code which right packs every 4 bits of a 64 bit int. This is the naive way of doing it, I am using a lookup table and a loop. I am wondering if there is a faster bit twiddling, swar/simd, parallel way to do this any faster? (msb() returns most significant bit)
def pack(X):
compact = [
0b0000, # 0
0b0001, # 1
0b0001, # 10
0b0011, # 11
0b0001, #100
0b0011, #101
0b0011, #110
0b0111, #111
0b0001, #1000
0b0011, #1001
0b0011, #1010
0b0111, #1011
0b0011, #1100
0b0111, #1101
0b0111, #1110
0b1111, #1111
]
K = 0
while X:
i = msb(X)
j = (i//4 )*4
a = (X & (0b1111 << j))>>j
K |= compact[a] << j
X = X & ~(0b1111 << j)
return K
An alternative that does not need any special SIMD instruction is to take each of the 4 bits into account separately: