Accessing 8-bit data as 7-bit

Question

Accessing 8-bit data as 7-bit

I have an array of 100 uint8_t

that needs to be processed as an 800 bit stream and process 7 bits at a time. In other words, if the first element of the 8-bit array contains 0b11001100

, and the second has a value ob11110000

, then when I come to read it in 7-bit format, the first element of the 7-bit array will be 0b1100110

, and the second will be 0b0111100

, and the remaining 2 bits will be held in third. The first thing I tried was concatenation ...

struct uint7_t {
    uint8_t i1:7;
};

union uint7_8_t {
    uint8_t u8[100];
    uint7_t u7[115];
};

but of course all bytes are aligned and I am essentially just losing the 8th bit of each item.

Does anyone know how I can do this?

To be clear, this is something like a visual representation of the result of the merge:

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

32 bits of 8-bit data
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx

32 bits of 7-bit data.

And that means what I want to do instead:

xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

32 bits of 8-bit data
xxxxxxx xxxxxxx xxxxxxx xxxxxxx xxxx

32 bits of 7-bit data.

I know the last bits can be padded, but that's ok, I just want to access each byte 7 bits at a time without losing any of the 800 bits. So far, the only way I can think of is a lot of offset, which will of course work, but I'm sure there is a cleaner way to get around this (?)

Thanks in advance for any answers.

+3

c ++ arrays unions 8bit 7bit

Howard_Schmidtt Jun 28. 17 at 18:49

source to share

7 replies

Zalman stern · Answer 1 · 2017-06-28T19:30:24+0000

Not sure what you mean by "cleaner". Typically, people who work on a problem like this regularly see wrapping and masking as the right primitive tool to use. It is possible to do something like defining a bit abstraction with a method to read an arbitrary number of bits from the stream. This abstraction sometimes appears in compression applications. The internals of the method, of course, use offset and masking.

One fairly clean approach is to write a function that retrieves a 7-bit number at any bit index in an unsigned char array. Use division to convert the bit index to byte index and modulus to get the bit index in byte. Then slide and mask. The input bits can span two bytes, so you either need to glue the 16-bit value before extracting, or do two smaller extractions and / or together to plot the result.

If I were aiming for something moderately productive, I would probably take one of two approaches:

The first has two state variables that indicate how many bits to take from the current and next bytes. It will use shift, masking and bitwise value, or to create the current output (for example, a number from 0 to 127 as an int), then the loop will update both state variables by adding and modulo, and will increment the current byte pointers if all bits are in the first byte were used.

The second approach is to load the 56 bit (8 outputs of the input) into a 64 bit integer and use the fully expanded structure to extract each of the 8 outputs. Doing this without using unmodified memory reads requires building a 64-bit integer chunk. (56 bits is special because the starting bit position is byte-aligned.)

To go really fast, I can try writing SIMD code in Halide. I suppose this is out of scope. (And it's not clear that he will actually win much.)

Constructs that read more than one byte into an integer at a time will probably have to respect the processor's byte order.

wally · Answer 2 · 2017-06-28T21:07:49+0000

Here is a solution that uses bool vector specialization. It also uses a similar mechanism to access seven-bit elements through referenced objects.

Member functions allow the following operations:

uint7_t x{5};               // simple value
Arr<uint7_t> arr(10);       // array of size 10
arr[0] = x;                 // set element
uint7_t y = arr[0];         // get element
arr.push_back(uint7_t{9});  // add element
arr.push_back(x);           //
std::cout << "Array size is " 
    << arr.size() << '\n';  // get size
for(auto&& i : arr) 
    std::cout << i << '\n'; // range-for to read values
int z{50};
for(auto&& i : arr)
    i = z++;                // range-for to change values
auto&& v = arr[1];          // get reference to second element
v = 99;                     // change second element via reference

Complete program:

#include <vector>
#include <iterator>
#include <iostream>

struct uint7_t {
    unsigned int i : 7;
};

struct seven_bit_ref {
    size_t begin;
    size_t end;
    std::vector<bool>& bits;

    seven_bit_ref& operator=(const uint7_t& right)
    {
        auto it{bits.begin()+begin};
        for(int mask{1}; mask != 1 << 7; mask <<= 1)
            *it++ = right.i & mask;
        return *this;
    }

    operator uint7_t() const
    {
        uint7_t r{};
        auto it{bits.begin() + begin};
        for(int i{}; i < 7; ++i)
            r.i += *it++ << i;
        return r;
    }

    seven_bit_ref operator*()
    {
        return *this;
    }

    void operator++()
    {
        begin += 7;
        end += 7;
    }

    bool operator!=(const seven_bit_ref& right)
    {
        return !(begin == right.begin && end == right.end);
    }

    seven_bit_ref operator=(int val)
    {
        uint7_t temp{};
        temp.i = val;
        operator=(temp);
        return *this;
    }

};

template<typename T>
class Arr;

template<>
class Arr<uint7_t> {
public:
    Arr(size_t size) : bits(size * 7, false) {}

    seven_bit_ref operator[](size_t index)
    {
        return {index * 7, index * 7 + 7, bits};
    }
    size_t size()
    {
        return bits.size() / 7;
    }
    void push_back(uint7_t val)
    {
        for(int mask{1}; mask != 1 << 7; mask <<= 1){
            bits.push_back(val.i & mask);
        }
    }

    seven_bit_ref begin()
    {
        return {0, 7, bits};
    }

    seven_bit_ref end()
    {
        return {size() * 7, size() * 7 + 7, bits};
    }

    std::vector<bool> bits;
};

std::ostream& operator<<(std::ostream& os, uint7_t val)
{
    os << val.i;
    return os;
}

int main()
{
    uint7_t x{5};               // simple value
    Arr<uint7_t> arr(10);       // array of size 10
    arr[0] = x;                 // set element
    uint7_t y = arr[0];         // get element
    arr.push_back(uint7_t{9});  // add element
    arr.push_back(x);           //
    std::cout << "Array size is " 
        << arr.size() << '\n';  // get size
    for(auto&& i : arr) 
        std::cout << i << '\n'; // range-for to read values
    int z{50};
    for(auto&& i : arr)
        i = z++;                // range-for to change values
    auto&& v = arr[1];          // get reference
    v = 99;                     // change via reference
    std::cout << "\nAfter changes:\n";
    for(auto&& i : arr)
        std::cout << i << '\n';
}

Andre Kampling · Answer 3 · 2017-06-29T18:15:09+0000

The following code works as you asked for it, but first output and live example on ideone .

Output:

Before changing values...:
7 bit representation: 1111111 0000000 0000000 0000000 0000000 0000000 0000000 0000000 
8 bit representation: 11111110 00000000 00000000 00000000 00000000 00000000 00000000 

After changing values...:
7 bit representation: 1000000 1001100 1110010 1011010 1010100 0000111 1111110 0000000 
8 bit representation: 10000001 00110011 10010101 10101010 10000001 11111111 00000000 

8 Bits: 11111111 to ulong: 255
7 Bits: 1111110 to ulong: 126

After changing values...:
7 bit representation: 0010000 0101010 0100000 0000000 0000000 0000000 0000000 0000000 
8 bit representation: 00100000 10101001 00000000 00000000 00000000 00000000 00000000

It's very easy using std :: bitset in a class called BitVector

. I am implementing one getter and setter. The getter also returns std :: bitset at the given index selIdx

with the given size of the template argument M

. The given idx will be multiplied by the given size M

to get the correct position. The returned set of bits can also be converted to numeric or string values.
The installer uses the uint8_t value as input and again the index selIdx

. The bits will be shifted to the correct position in the bits.

Further you can use getter and setter with different sizes because of the template argument M

, which means you can work with 7 or 8 bit representation, but also 3 or whatever you like.

I'm sure this code isn't the best in terms of speed, but I think it's a very clear and clean solution. Also, it is not complete as there is only one getter, one setter, and two constructors. Remember to check for errors regarding indices and sizes.

Code:

#include <iostream>
#include <bitset>

template <size_t N> class BitVector
{
private:

   std::bitset<N> _data;

public:

   BitVector (unsigned long num) : _data (num) { };
   BitVector (const std::string& str) : _data (str) { };

   template <size_t M>
   std::bitset<M> getBits (size_t selIdx)
   {
      std::bitset<M> retBitset;
      for (size_t idx = 0; idx < M; ++idx)
      {
         retBitset |= (_data[M * selIdx + idx] << (M - 1 - idx));
      }
      return retBitset;
   }

   template <size_t M>
   void setBits (size_t selIdx, uint8_t num)
   {
      const unsigned char* curByte = reinterpret_cast<const unsigned char*> (&num);
      for (size_t bitIdx = 0; bitIdx < 8; ++bitIdx)
      {
         bool bitSet = (1 == ((*curByte & (1 << (8 - 1 - bitIdx))) >> (8 - 1 - bitIdx)));
         _data.set(M * selIdx + bitIdx, bitSet);
      }
   }

   void print_7_8()
   {
      std:: cout << "\n7 bit representation: ";
      for (size_t idx = 0; idx < (N / 7); ++idx)
      {
         std::cout << getBits<7>(idx) << " ";
      }
      std:: cout << "\n8 bit representation: ";
      for (size_t idx = 0; idx < N / 8; ++idx)
      {
         std::cout << getBits<8>(idx) << " ";
      }
   }
};

int main ()
{
   BitVector<56> num = 127;

   std::cout << "Before changing values...:";
   num.print_7_8();

   num.setBits<8>(0, 0x81);
   num.setBits<8>(1, 0b00110011);
   num.setBits<8>(2, 0b10010101);
   num.setBits<8>(3, 0xAA);
   num.setBits<8>(4, 0x81);
   num.setBits<8>(5, 0xFF);
   num.setBits<8>(6, 0x00);

   std::cout << "\n\nAfter changing values...:";
   num.print_7_8();

   std::cout << "\n\n8 Bits: " << num.getBits<8>(5) << " to ulong: " << num.getBits<8>(5).to_ulong();
   std::cout << "\n7 Bits: " << num.getBits<7>(6) << " to ulong: " << num.getBits<7>(6).to_ulong();

   num = BitVector<56>(std::string("1001010100000100"));
   std::cout << "\n\nAfter changing values...:";
   num.print_7_8();

   return 0;
}

K. Kirsz · Answer 4 · 2017-06-28T19:18:47+0000

Here's one approach without manual switching. This is just a rough POC, but hopefully you can learn something from this. I don't know if you can easily convert your input to bitbit, but I think it should be possible.

int bytes = 0x01234567;
bitset<32> bs(bytes);
cout << "Input: " << bs << endl;
for(int i = 0; i < 5; i++)
{
    bitset<7> slice(bs.to_string().substr(i*7, 7));
    cout << slice << endl;
}

Plus, it's probably much less doable than the bitrate version, so I wouldn't recommend it for heavy work.

geza · Answer 5 · 2017-06-28T19:23:33+0000

You can use this to get the 7 bit index element from in

(note that it does not have proper array processing completion). Simple, fast.

int get7(const uint8_t *in, int index) {
    int fidx = index*7;
    int idx = fidx>>3;
    int sidx = fidx&7;

    return (in[idx]>>sidx|in[idx+1]<<(8-sidx))&0x7f;
}

robthebloke · Answer 6 · 2017-06-29T03:06:49+0000

Process them in groups of 8 (since 8x7 are nicely rounded to 8-bit aligned). Bitwise operators are the order of the day here. Hunting with the last (up) 7 numbers is a little awkward, but not impossible. (This code assumes they are unsigned 7-bit integers! A signed conversion would require you to consider flipping the top bit if bit [6] is 1)

// convert 8 x 7bit ints in one go
void extract8(const uint8_t input[7], uint8_t output[8])
{
  output[0] =   input[0] & 0x7F;
  output[1] =  (input[0] >> 7)  | ((input[1] << 1) & 0x7F);
  output[2] =  (input[1] >> 6)  | ((input[2] << 2) & 0x7F);
  output[3] =  (input[2] >> 5)  | ((input[3] << 3) & 0x7F);
  output[4] =  (input[3] >> 4)  | ((input[4] << 4) & 0x7F);
  output[5] =  (input[4] >> 3)  | ((input[5] << 5) & 0x7F);
  output[6] =  (input[5] >> 2)  | ((input[6] << 6) & 0x7F);
  output[7] =   input[6] >> 1;
}

// convert array of 7bit ints to 8bit
void seven_bit_to_8bit(const uint8_t* const input, uint8_t* const output, const size_t count)
{
  size_t count8 = count >> 3;
  for(size_t i = 0; i < count8; ++i)
  {
    extract8(input + 7 * i, output + 8 * i);
  }

  // handle remaining (upto) 7 bytes 
  const size_t countr = (count % 8);
  if(countr)
  {
    // how many bytes do we need to copy from the input?
    size_t remaining_bits = 7 * countr;
    if(remaining_bits % 8)
    {
      // round to next nearest multiple of 8
      remaining_bits += (8 - remaining_bits % 8);
    }
    remaining_bits /= 8;
    {
      uint8_t in[7] = {0}, out[8] = {0};
      for(size_t i = 0; i < remaining_bits; ++i)
      {
        in[i] = input[count8 * 7 + i];
      }
      extract8(in, out);
      for(size_t i = 0; i < countr; ++i)
      {
        output[count8 * 8 + i] = in[i];
      }
    }
  }
}

powturbo · Answer 7 · 2017-06-30T12:06:11+0000

You can use direct access boxing / unboxing or bulk boxing as in TurboPFor: Integer compression

// Direct read access 
// b : bit width 0-16 (7 in your case)

#define bzhi32(u,b) ((u) & ((1u  <<(b))-1))

static inline unsigned  bitgetx16(unsigned char *in, 
                                  unsigned  idx, 
                                  unsigned b) { 
  unsigned bidx = b*idx; 
  return bzhi32( *(unsigned *)((uint16_t *)in+(bidx>>4)) >> (bidx& 0xf), b );
}

Accessing 8-bit data as 7-bit

More articles: