Bitfield and Union - unexpected result in C

I have been assigned the following assignment in the C-course:

enter image description here

I followed the assignment to decode 8 byte long int 131809282883593 like this:

    #include  <stdio.h>
    #include <string.h>

    struct Message {
         unsigned int hour : 5;
         unsigned int minutes : 6;
         unsigned int seconds : 6;
         unsigned int day : 5;
         unsigned int month : 4;
         unsigned int year : 12;
         unsigned long long int code : 26;
    };  // 64 bit in total

    union Msgdecode {
        long long  int datablob;
        struct Message elems;
    };

    int main(void) {

        long long int datablob = 131809282883593;
        union Msgdecode m;

        m.datablob = datablob;

        printf("%d:%d:%d %d.%d.%d code:%lu\n", m.elems.hour, m.elems.minutes,
        m.elems.seconds, m.elems.day, m.elems.month, m.elems.year,(long unsigned int) m.elems.code); 

        union Msgdecode m2;
        m2.elems.hour = 9;
        m2.elems.minutes = 0;
        m2.elems.seconds = 0;
        m2.elems.day = 30;
        m2.elems.month = 5;
        m2.elems.year = 2017;
        m2.elems.code = 4195376;

        printf("m2.datablob: should: 131809282883593 is: %lld\n", m2.datablob); //WHY does m2.datablob != m.datablob?! 
        printf("m.datablob:  should: 131809282883593 is: %lld\n", m.datablob);

        printf("%d:%d:%d %d.%d.%d code:%lu\n", m2.elems.hour, m2.elems.minutes,
          m2.elems.seconds, m2.elems.day, m2.elems.month, m2.elems.year, (long unsigned int) m2.elems.code);

    }

      

TRY ONLINE.

.. What is hard for me is the result. Decoding / encoding still works. 9: 0: 0 5/30/2017 and code 4195376 is expected, but the difference in "datablob" is not really - and I can't figure out why / where this is happening:

9:0:0 30.5.2017 code:4195376
m2.datablob: should: 131809282883593 is: 131810088189961
m.datablob:  should: 131809282883593 is: 131809282883593
9:0:0 30.5.2017 code:4195376

      

As you can see, the data block is close to the original, but not the original. I consulted with a colleague who speaks freely about this, but we could not understand the reason for this behavior.

Q: Why are the drops different from each other?

Bonus-Q: When using a union Msgdecode

to include another field, a strange thing happens:

union Msgdecode {
    long long  int datablob;
    struct Message elems;
    char bytes[8];  // added this
};

      

Result:

9:0:0 30.5.2017 code:0
m2.datablob: should: 131809282883593 is: 8662973939721
m.datablob:  should: 131809282883593 is: 131809282883593
9:0:0 30.5.2017 code:4195376

      

PS: reading on SO about bitfields + merge questions gave me the impression that they are unreliable. Can you even say that?

+3


source to share


3 answers


The layout of the bitfields within struct

and any padding that may exist between them is implementation-defined.

From section 6.7.2.1 Standard C :

11 An implementation MAY allocate any address storage unit large enough to store the bit field. If there is enough space, the bit-field that immediately follows another bit-field in the structure should be packed into adjacent bits of the same block. If not enough space is left, is there a bit-field that does not fit, the next block, or an overlap of adjacent implementation blocks. The distribution order of the bit-fields within one (from high order to low order or low order) of the implementation. The alignment of the address store unit is not specified.

This means that you cannot rely on the layout in a standard way.

That being said, let's take a look at how the beats are laid out in this particular case. To reiterate, everything from here down is in the realm of implementing a specific behavior . We'll start with the second case, which m2.datablob

is 8662973939721, since it's easier to explain.

Let's first look at the bit representation of the values ​​you assign m2

:

 - hour:       9:   0 1001 (0x09)
 - minutes:    0:   00 0000 (0x00)
 - seconds:    0:   00 0000 (0x00)
 - day:       30:   11 1110 (0x3E)
 - month:      5:   0101 (0x05)
 - year:    2017:   0111 1110 0001 (0x7e1)
 - code: 4195376:   00 0100 0000 0000 0100 0011 0000 (0x0400430)

      

Now let's take a look at the blob values, first m

that assigns blobvalue

, then m2

that assigns each field individually the above values:

131809282883593  0x77E13D7C0009                        0111 0111 1110 0001 
                                   0011 1101 0111 1100 0000 0000 0000 1001

  8662973939721  0x07E1017C0009                        0000 0111 1110 0001 
                                   0000 0001 0111 1100 0000 0000 0000 1001

      

If we start by looking at the values ​​from left to right, we see the value 9, so our first 5 bits. This is followed by two sets of 6 zero bits for the next two fields. After that we see bit patterns for 30, then 5.
A little further we see a bit pattern for 2017, but 6 bits are set between this value and the previous values. So it looks like this:

          year        ???  month  day   sec     min  hour
      ------------   -----  ---  ----  ------  ----- -----
     |            | |     ||   ||    ||      ||     |     |
0000 0111 1110 0001 0000 0001 0111 1100 0000 0000 0000 1001

      

So, there are some additions between the year

and fields month

. Comparing the representations of m

and m2

, the differences are 6 bits of padding between month

and year

and 4 bits to the left of year

.

What we don't see here is the bits for the field code

. So how big is the structure?

If we add this to the code:

printf("size = %zu\n", sizeof(struct Message));

      

We get:



size = 16

      

This is much more than we thought. So let's create an array bytes

unsigned char [16]

and output it. Code:

int i;
printf("m: ");
for (i=0; i<16; i++) {
    printf(" %02x", m.bytes[i]);
}
printf("\n");
printf("m2:");
for (i=0; i<16; i++) {
    printf(" %02x", m2.bytes[i]);
}
printf("\n");

      

Output:

m:  09 00 7c 3d e1 77 00 00 00 00 00 00 00 00 00 00
m2: 09 00 7c 01 e1 07 00 00 30 04 40 00 00 00 00 00

      

We now see the pattern bit 0x0400430 corresponding to the code field in the representation for m2. There are 20 more padding bits before this field. Also note that the bytes are in reverse order of the value, which tells us we are on a small machine. Given the way the values ​​are laid out, it is also likely that the bits within each byte are also of little value.

So why filling? This is most likely due to alignment. The first 5 fields are 8 bits or less, which means that each one fits in bytes. There is no alignment requirement for single bytes, so they are packed. The next field is 12 bits, which means it must fit into a 16-bit (2 bytes) field. This adds 6 bits of padding, so this field starts at a 2 byte offset. The next field is 26 bits, which requires a 32-bit field. This would mean that it needs to start at a 4 byte offset and use 4 bytes, however, since this field is declared unsigned long long

, which in this case is 8 bytes, the field uses 8 bytes. If you were to declare this field unsigned int

, it will probably still start at the same offset, but will only use 4 bytes instead of 8.

Now what about the first case where the blob value is 131810088189961? Let's take a look at its performance versus "expected":

131809282883593  0x77E13D7C0009                        0111 0111 1110 0001 
                                   0011 1101 0111 1100 0000 0000 0000 1001

131810088189961  0x77E16D7C0009                        0111 0111 1110 0001 
                                   0110 1101 0111 1100 0000 0000 0000 1001

      

These two representations have the same value in the bits that store the data. The difference between the two is the 6 bits of padding between the month

and fields year

. As to why this representation is different, the compiler probably made some optimizations when it realized that certain bits were not or could not be read or written. By adding an array char

to the union, it could possibly read or write those bits so that the optimization no longer occurs.

With gcc you can try using __attribute((packed))

in a framework. This gives the following output (after setting the array bytes

to 8 along with the loop constraints when printing):

size = 8
9:0:0 30.5.2127 code:479
m2.datablob: should: 131809282883593 is: 1153216309106573321
m.datablob:  should: 131809282883593 is: 131809282883593
9:0:0 30.5.2017 code:4195376
m:  09 00 7c 3d e1 77 00 00
m2: 09 00 7c 85 1f 0c 01 10

      

And the bit representation:

1153216309106573321 0x10010C1F857C0009   0001 0000 0000 0001 0000 1100 0001 1111
                                         1000 0101 0111 1100 0000 0000 0000 1001

    131810088189961 0x77E16D7C0009       0000 0000 0000 0000 0111 0111 1110 0001 
                                         0110 1101 0111 1100 0000 0000 0000 1001

      

Even so, you may run into problems .

So, to summarize, there is no layout guarantee with bitfields. You are better off using bit shifting and masking to get the values ​​in and out of the bits rather than trying to overlay them.

+3


source


The problem here is line 37, where you do:

m2.elems.code = 4195376;

      

You assigned an invalid type in the bit field:

struct Message {
     unsigned int hour : 5;
     unsigned int minutes : 6;
     unsigned int seconds : 6;
     unsigned int day : 5;
     unsigned int month : 4;
     unsigned int year : 12;
     unsigned long long int code : 26; <-- invalid
};

      



see https://www.tutorialspoint.com/cprogramming/c_bit_fields.htm in the topic: Bitfield declaration

It says that you can use int , signed int and unsigned int as Type.

I think the compiler is interpreting m2.elems.code as int and I don't know what exactly it does with assignment greater than max int .

+1


source


To reiterate, the arrangement of bits within bitfield structures is not guaranteed (that is, it depends on the compiler), and therefore this kind of bit manipulation is not good practice. To achieve this functionality, you need to use bit manipulation.

Quick example:

#define HOUR_BIT_START 59 // The bit number where hour bits starts
#define HOUR_BIT_MASK 0x1F // Mask of 5 bits for the hour bits

unsigned int getHour(long long int blob)
{
    return (unsigned int)((blob >> HOUR_BIT_START) & HOUR_BIT_MASK);
}

int main (int argc, char *argv[]) 
{
    unsigned long long int datablob = 131809282883593;

    printf("%d:%d:%d %d.%d.%d code:%lu\n", getHour(datablob), getMinutes(datablob), getSeconds(datablob), getDay(datablob), getMonth(datablob), getyear(datablob), getCode(datablob)); 
}

      

I'll leave the implementation of other get*()

functions

+1


source







All Articles