Accessing 8-bit data as 7-bit

3.8k Views Asked by At

I have an array of 100 uint8_t's, which is to be treated as a stream of 800 bits, and dealt with 7 bits at a time. So in other words, if the first element of the 8-bit array holds 0b11001100 and the second holds ob11110000 then when I come to read it in 7-bit format, the first element of the 7-bit array would be 0b1100110 and the second would be 0b0111100 with the remaining 2 bits being held in the 3rd. The first thing I tried was a union...

struct uint7_t {
    uint8_t i1:7;
};

union uint7_8_t {
    uint8_t u8[100];
    uint7_t u7[115];
};

but of course everything's byte aligned and I essentially end up simply loosing the 8th bit of each element.

Does anyone have any idea's on how I can go about doing this?

Just to be clear, this is something of a visual representation of the result of the union:

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 32 bits of 8 bit data
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx 32 bits of 7-bit data.

And this represents what it is that I want to do instead:

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 32 bits of 8 bit data
xxxxxxx xxxxxxx xxxxxxx xxxxxxx xxxx 32 bits of 7-bit data.

I'm aware the last bits may be padded but that's fine, I just want someway of accessing each byte 7 bits at a time without losing any of the 800 bits. So far the only way I can think of is lots of bit shifting, which of course would work but I'm sure there's a cleaner way of going about it(?)

Thanks in advance for any answers.

8

There are 8 best solutions below

0
On

Here is a solution that uses the vector bool specialization. It also uses a similar mechanism to allow access to the seven-bit elements via reference objects.

The member functions allow for the following operations:

uint7_t x{5};               // simple value
Arr<uint7_t> arr(10);       // array of size 10
arr[0] = x;                 // set element
uint7_t y = arr[0];         // get element
arr.push_back(uint7_t{9});  // add element
arr.push_back(x);           //
std::cout << "Array size is " 
    << arr.size() << '\n';  // get size
for(auto&& i : arr) 
    std::cout << i << '\n'; // range-for to read values
int z{50};
for(auto&& i : arr)
    i = z++;                // range-for to change values
auto&& v = arr[1];          // get reference to second element
v = 99;                     // change second element via reference

Full program:

#include <vector>
#include <iterator>
#include <iostream>

struct uint7_t {
    unsigned int i : 7;
};

struct seven_bit_ref {
    size_t begin;
    size_t end;
    std::vector<bool>& bits;

    seven_bit_ref& operator=(const uint7_t& right)
    {
        auto it{bits.begin()+begin};
        for(int mask{1}; mask != 1 << 7; mask <<= 1)
            *it++ = right.i & mask;
        return *this;
    }

    operator uint7_t() const
    {
        uint7_t r{};
        auto it{bits.begin() + begin};
        for(int i{}; i < 7; ++i)
            r.i += *it++ << i;
        return r;
    }

    seven_bit_ref operator*()
    {
        return *this;
    }

    void operator++()
    {
        begin += 7;
        end += 7;
    }

    bool operator!=(const seven_bit_ref& right)
    {
        return !(begin == right.begin && end == right.end);
    }

    seven_bit_ref operator=(int val)
    {
        uint7_t temp{};
        temp.i = val;
        operator=(temp);
        return *this;
    }

};

template<typename T>
class Arr;

template<>
class Arr<uint7_t> {
public:
    Arr(size_t size) : bits(size * 7, false) {}

    seven_bit_ref operator[](size_t index)
    {
        return {index * 7, index * 7 + 7, bits};
    }
    size_t size()
    {
        return bits.size() / 7;
    }
    void push_back(uint7_t val)
    {
        for(int mask{1}; mask != 1 << 7; mask <<= 1){
            bits.push_back(val.i & mask);
        }
    }

    seven_bit_ref begin()
    {
        return {0, 7, bits};
    }

    seven_bit_ref end()
    {
        return {size() * 7, size() * 7 + 7, bits};
    }

    std::vector<bool> bits;
};

std::ostream& operator<<(std::ostream& os, uint7_t val)
{
    os << val.i;
    return os;
}

int main()
{
    uint7_t x{5};               // simple value
    Arr<uint7_t> arr(10);       // array of size 10
    arr[0] = x;                 // set element
    uint7_t y = arr[0];         // get element
    arr.push_back(uint7_t{9});  // add element
    arr.push_back(x);           //
    std::cout << "Array size is " 
        << arr.size() << '\n';  // get size
    for(auto&& i : arr) 
        std::cout << i << '\n'; // range-for to read values
    int z{50};
    for(auto&& i : arr)
        i = z++;                // range-for to change values
    auto&& v = arr[1];          // get reference
    v = 99;                     // change via reference
    std::cout << "\nAfter changes:\n";
    for(auto&& i : arr)
        std::cout << i << '\n';
}
0
On

Here is one approach without the manual shifting. This is just a crude POC, but hopefully you will be able to get something out of it. I don't know if you are able to easily transform your input into bitset, but i think it should be possible.

int bytes = 0x01234567;
bitset<32> bs(bytes);
cout << "Input: " << bs << endl;
for(int i = 0; i < 5; i++)
{
    bitset<7> slice(bs.to_string().substr(i*7, 7));
    cout << slice << endl;
}

Also this is probably much less performant then the bitshifting version, so i wouldn't recommend it for heavy lifting.

0
On

You can use this to get the index'th 7-bit element from in (note that it doesn't have proper end of array handling). Simple, fast.

int get7(const uint8_t *in, int index) {
    int fidx = index*7;
    int idx = fidx>>3;
    int sidx = fidx&7;

    return (in[idx]>>sidx|in[idx+1]<<(8-sidx))&0x7f;
}
1
On

Not sure what you mean by "cleaner". Generally people who work on this sort of problem regularly consider shifting and masking to be the right primitive tool to use. One can do something like defining a bitstream abstraction with a method to read an arbitrary number of bits off the stream. This abstraction sometimes shows up in compression applications. The internals of the method of course do use shifting and masking.

One fairly clean approach is to write a function which extracts a 7-bit number at any bit index in an array of unsigned char's. Use a division to convert the bit index to a byte index, and modulus to get the bit index within the byte. Then shift and mask. The input bits can span two bytes, so you either have to glue together a 16-bit value before extraction, or do two smaller extractions and or them together to construct the result.

If I were aiming for something moderately performant, I'd likely take one of two approaches:

The first has two state variables saying how many bits to take from the current and next byte. It would use shifting, masking, and bitwise or, to produce the current output (a number between 0 and 127 as an int for example), then the loop would update both state variables via adding and modulus, and would increment the current byte pointers if all bits in the first byte were consumed.

The second approach is to load 56-bits (8 outputs worth of input) into a 64-bit integer and use a fully unrolled structure to extract each of the 8 outputs. Doing this without using unaligned memory reads requires constructing the 64-bit integer piecemeal. (56-bits is special because the starting bit position is byte aligned.)

To go real fast, I might try writing SIMD code in Halide. That's beyond scope here I believe. (And not clear it is going to win much actually.)

Designs which read more than one byte into a integer at a time will likely have to consider processor byte ordering.

0
On

The following code works as you have asked for it, but first the output and live example on ideone.

Output:

Before changing values...:
7 bit representation: 1111111 0000000 0000000 0000000 0000000 0000000 0000000 0000000 
8 bit representation: 11111110 00000000 00000000 00000000 00000000 00000000 00000000 

After changing values...:
7 bit representation: 1000000 1001100 1110010 1011010 1010100 0000111 1111110 0000000 
8 bit representation: 10000001 00110011 10010101 10101010 10000001 11111111 00000000 

8 Bits: 11111111 to ulong: 255
7 Bits: 1111110 to ulong: 126

After changing values...:
7 bit representation: 0010000 0101010 0100000 0000000 0000000 0000000 0000000 0000000 
8 bit representation: 00100000 10101001 00000000 00000000 00000000 00000000 00000000 

It is very straight forward using a std::bitset in a class called BitVector. I implement one getter and setter. The getter returns also a std::bitset at the given index selIdx with a given template argument size M. The given idx will be multiplied by the given size M to get the right position. The returned bitset can also be converted to numerical or string values.
The setter uses an uint8_t value as input and again the index selIdx. The bits will be shifted to the right position into the bitset.

Further you can use the getter and setter with different sizes because of the template argument M, which means you can work with either 7 or 8 bit representation but also 3 or what ever you like.

I'm sure this code is not the best concerning speed, but I think it is a very clear and clean solution. Also it is not complete at all as there are just one getter, one setter and two constructors. Remember to implement error checking concerning indexes and sizes.

Code:

#include <iostream>
#include <bitset>

template <size_t N> class BitVector
{
private:

   std::bitset<N> _data;

public:

   BitVector (unsigned long num) : _data (num) { };
   BitVector (const std::string& str) : _data (str) { };

   template <size_t M>
   std::bitset<M> getBits (size_t selIdx)
   {
      std::bitset<M> retBitset;
      for (size_t idx = 0; idx < M; ++idx)
      {
         retBitset |= (_data[M * selIdx + idx] << (M - 1 - idx));
      }
      return retBitset;
   }

   template <size_t M>
   void setBits (size_t selIdx, uint8_t num)
   {
      const unsigned char* curByte = reinterpret_cast<const unsigned char*> (&num);
      for (size_t bitIdx = 0; bitIdx < 8; ++bitIdx)
      {
         bool bitSet = (1 == ((*curByte & (1 << (8 - 1 - bitIdx))) >> (8 - 1 - bitIdx)));
         _data.set(M * selIdx + bitIdx, bitSet);
      }
   }

   void print_7_8()
   {
      std:: cout << "\n7 bit representation: ";
      for (size_t idx = 0; idx < (N / 7); ++idx)
      {
         std::cout << getBits<7>(idx) << " ";
      }
      std:: cout << "\n8 bit representation: ";
      for (size_t idx = 0; idx < N / 8; ++idx)
      {
         std::cout << getBits<8>(idx) << " ";
      }
   }
};

int main ()
{
   BitVector<56> num = 127;

   std::cout << "Before changing values...:";
   num.print_7_8();

   num.setBits<8>(0, 0x81);
   num.setBits<8>(1, 0b00110011);
   num.setBits<8>(2, 0b10010101);
   num.setBits<8>(3, 0xAA);
   num.setBits<8>(4, 0x81);
   num.setBits<8>(5, 0xFF);
   num.setBits<8>(6, 0x00);

   std::cout << "\n\nAfter changing values...:";
   num.print_7_8();

   std::cout << "\n\n8 Bits: " << num.getBits<8>(5) << " to ulong: " << num.getBits<8>(5).to_ulong();
   std::cout << "\n7 Bits: " << num.getBits<7>(6) << " to ulong: " << num.getBits<7>(6).to_ulong();

   num = BitVector<56>(std::string("1001010100000100"));
   std::cout << "\n\nAfter changing values...:";
   num.print_7_8();

   return 0;
}
0
On

Process them in groups of 8 (since 8x7 nicely rounds to something 8bit aligned). Bitwise operators are the order of the day here. Faffing around with the last (upto) 7 numbers is a little faffy, but not impossible. (This code assumes these are unsigned 7 bit integers! Signed conversion would require you to do consider flipping the top bit if bit[6] is 1)

// convert 8 x 7bit ints in one go
void extract8(const uint8_t input[7], uint8_t output[8])
{
  output[0] =   input[0] & 0x7F;
  output[1] =  (input[0] >> 7)  | ((input[1] << 1) & 0x7F);
  output[2] =  (input[1] >> 6)  | ((input[2] << 2) & 0x7F);
  output[3] =  (input[2] >> 5)  | ((input[3] << 3) & 0x7F);
  output[4] =  (input[3] >> 4)  | ((input[4] << 4) & 0x7F);
  output[5] =  (input[4] >> 3)  | ((input[5] << 5) & 0x7F);
  output[6] =  (input[5] >> 2)  | ((input[6] << 6) & 0x7F);
  output[7] =   input[6] >> 1;
}

// convert array of 7bit ints to 8bit
void seven_bit_to_8bit(const uint8_t* const input, uint8_t* const output, const size_t count)
{
  size_t count8 = count >> 3;
  for(size_t i = 0; i < count8; ++i)
  {
    extract8(input + 7 * i, output + 8 * i);
  }

  // handle remaining (upto) 7 bytes 
  const size_t countr = (count % 8);
  if(countr)
  {
    // how many bytes do we need to copy from the input?
    size_t remaining_bits = 7 * countr;
    if(remaining_bits % 8)
    {
      // round to next nearest multiple of 8
      remaining_bits += (8 - remaining_bits % 8);
    }
    remaining_bits /= 8;
    {
      uint8_t in[7] = {0}, out[8] = {0};
      for(size_t i = 0; i < remaining_bits; ++i)
      {
        in[i] = input[count8 * 7 + i];
      }
      extract8(in, out);
      for(size_t i = 0; i < countr; ++i)
      {
        output[count8 * 8 + i] = in[i];
      }
    }
  }
}
0
On

You can use direct access or bulk bit packing/unpacking as in TurboPFor:Integer Compression

// Direct read access 
// b : bit width 0-16 (7 in your case)

#define bzhi32(u,b) ((u) & ((1u  <<(b))-1))

static inline unsigned  bitgetx16(unsigned char *in, 
                                  unsigned  idx, 
                                  unsigned b) { 
  unsigned bidx = b*idx; 
  return bzhi32( *(unsigned *)((uint16_t *)in+(bidx>>4)) >> (bidx& 0xf), b );
}
0
On

I found this thread while searching 8 bit to 7 bit conversion. But there is no good answer. So I write this code. This is how I think conversion should be done for small arrays.

#include <iostream>
#include <vector>
#include <bitset>
#include <cstdint>

std::vector<uint8_t> convert7bitTo8bit(const std::vector<uint8_t>& input)
{
    std::vector<uint8_t> output;
    int acc = 0;
    int bitCount = 0;

    for (auto byte : input)
    {
        acc |= (byte << bitCount);
        bitCount += 7;

        if (bitCount >= 8)
        {
            output.push_back(acc & 0xFF);
            acc >>= 8;
            bitCount -= 8;
        }
    }

    if (bitCount > 0)
    {
        output.push_back(acc & 0xFF);
    }

    return output;
}

std::vector<uint8_t> convert8bitTo7bit(const std::vector<uint8_t>& input)
{
    std::vector<uint8_t> output;
    int acc = 0;
    int bitCount = 0;

    for (auto byte : input)
    {
        acc |= (byte << bitCount);
        bitCount += 8;

        while (bitCount >= 7)
        {
            output.push_back(acc & 0x7F);
            acc >>= 7;
            bitCount -= 7;
        }
    }

    if (bitCount > 0)
    {
        output.push_back(acc & 0x7F);
    }

    return output;
}

void dump_bits(const char* name, std::vector<uint8_t>& bytes)
{   
    std::cout << name;

    for (uint8_t byte : bytes)
    {   std::cout << " " << std::bitset<8>(byte);
    }
    std::cout << std::endl;
}

int main()
{
    int failedTests = 0;

    for(auto test = 0; test < 100; ++test)
    {   std::vector<uint8_t> originalData;
        originalData.resize(rand() % 20);
        for(auto i = 0; i < originalData.size(); ++i)
            originalData[i] = rand();

        dump_bits("originalData", originalData);

        std::vector<uint8_t> sevenBitData = convert8bitTo7bit(originalData);

        std::vector<uint8_t> restoredData = convert7bitTo8bit(sevenBitData);
        dump_bits("restoredData", restoredData);

        dump_bits("sevenBitData", sevenBitData);

        restoredData.resize(originalData.size()); // restored data may be larger than original data

        if (originalData != restoredData)
        {   ++failedTests;
            std::cout << "TEST FAILED" << std::endl;
        }

        std::cout << std::endl;
    }

    std::cout << "failedTests: " << failedTests << std::endl;

    return 0;
}