Gcc / clang lay out derived struct fields into back-padding of base struct

I am confused about how gcc and clang lay out structures when both addition and inheritance are involved. Here's a sample program:

#include <string.h>
#include <stdio.h>

struct A
{
    void* m_a;
};

struct B: A
{
    void* m_b1;
    char m_b2;
};

struct B2
{
    void* m_a;
    void* m_b1;
    char m_b2;
};

struct C: B
{
    short m_c;
};

struct C2: B2
{
    short m_c;
};

int main ()
{
    C c;
    memset (&c, 0, sizeof (C));
    memset ((B*) &c, -1, sizeof (B));

    printf (
        "c.m_c = %d; sizeof (A) = %d sizeof (B) = %d sizeof (C) = %d\n", 
        c.m_c, sizeof (A), sizeof (B), sizeof (C)
        );

    C2 c2;
    memset (&c2, 0, sizeof (C2));
    memset ((B2*) &c2, -1, sizeof (B2));

    printf (
        "c2.m_c = %d; sizeof (A) = %d sizeof (B2) = %d sizeof (C2) = %d\n", 
        c2.m_c, sizeof (A), sizeof (B2), sizeof (C2)
        );

    return 0;
}

      

Output:

$ ./a.out
c.m_c = -1; sizeof (A) = 8 sizeof (B) = 24 sizeof (C) = 24
c2.m_c = 0; sizeof (A) = 8 sizeof (B2) = 24 sizeof (C2) = 32

      

Structures C1 and C2 are laid out differently. In C1, m_c is allocated in the back-padding of the B1 structure and therefore overwritten by the second memset (); this does not happen with C2.

Compilers used:

$ clang --version
Ubuntu clang version 3.3-16ubuntu1 (branches/release_33) (based on LLVM 3.3)
Target: x86_64-pc-linux-gnu
Thread model: posix

$ c++ --version
c++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

      

The same thing happens with the -m32 option (the output sizes will be different, obviously).

Both x86 and x86_64 versions of the Microsoft Visual Studio 2010 C ++ compiler do not have this problem (i.e. they structure C1 and C2 structures in the same way)

If this is not a mistake by design, then my questions are:

  • What are the exact rules for allocating or not allocating derived struct fields in back-padding (e.g. why doesn't this happen with C2?)
  • is there a way to override this behavior with radio buttons / attributes (i.e. lay out just like MSVC)?

Thanks in advance.

Vladimir

+7


source to share


4 answers


For anyone who responds negatively to this question and the OP is cocky with righteous indignation about how awful UB his handwritten was memcpy

... please note that both libc ++ and libstdc ++ developers fall into the same pit. For the foreseeable future, it is actually very important to understand when the tail packing is reused and when it is not. Ok on OP for posing this question.

The Itanium ABI rules for structure markup are here . Corresponding wording

If D is the base class, update sizeof (C) to max (sizeof (C) offset (D) + nvsize (D)).

Here "dsize, nvsize and nvalign [of POD types] are defined as their normal size and alignment", but nvsize of a non-POD type is defined as "not a virtual size of an object, which is O size with no virtual bases [and also no tail padding]. β€œSo if D is a POD, we never put anything in his tail pad; while if D is not POD, we are allowed to cram the next element (or base) into its tail.



Hence, any non-POD type (even trivially copyable!) Must consider the possibility that it has important data pasted into its tail. This usually violates the developers' assumptions about what is acceptable to do with trivially copyable types (namely, that you can trivially copy them).

Test case of Wandbox:

#include <algorithm>
#include <stdio.h>

struct A {
    int m_a;
};

struct B : A {
    int m_b1;
    char m_b2;
};

struct C : B {
    short m_c;
};

int main() {
    C c1 { 1, 2, 3, 4 };
    B& b1 = c1;
    B b2 { 5, 6, 7 };

    printf("before operator=: %d\n", int(c1.m_c));  // 4
    b1 = b2;
    printf("after operator=: %d\n", int(c1.m_c));  // 4

    printf("before std::copy: %d\n", int(c1.m_c));  // 4
    std::copy(&b2, &b2 + 1, &b1);
    printf("after std::copy: %d\n", int(c1.m_c));  // 64, or 0, or anything but 4
}

      

+4


source


Your code displays undefined behavior since C and C2 are not PODs and memcpying over random bits of their data is not allowed.

However, in a slightly longer mode, this is a difficult problem. The existing C ABI on the platform (Unix) allowed this behavior (it is for C ++ 98 that it did). The committee then changed the rules inconsistently in C ++ 03 and C ++ 11. Klang at least has a transition to the new rules. Of course, the C ABI on Unix hasn't changed to accommodate the new C ++ 11 rules for placing things in add-on, so compilers can't update exactly as this will break the entire ABI.



I believe GCC retains the ABI changes for 5.0 and this may be one of them.

Windows has always prohibited this practice in their C ABI and hence I have no problem as far as I know.

+1


source


The difference is that the compiler is allowed to use the complement of the previous object if that object is no longer just data, and manipulation with it is memcpy

not supported.

A struct B

is not just data, because it is a derived object, and therefore its temporary space can be used, because if you - memcpy

insert an B

instance around, you are already breaking the contract.

B2

instead is just a structure and backward compatibility, requires its size (including the wait space) to be just memory that your code can play with memcpy

.

+1


source


Thanks everyone for your help.

Bottom line, C ++ compilers are allowed to reuse the tail cover of non-POD structures when laying out the fields of derived structures. Both GCC and clang use this permission, MSVC does not. GCC seems to have a Wabi warning flag that should help catch potential ABI incompatibilities, but it did not issue warnings with the above sample.

It looks like the only way to prevent this is to introduce explicit tail-padding fields.

0


source







All Articles