Managing C ++ objects in a buffer with alignment and layout assumptions

I am storing objects in a buffer. Now I know that I cannot make assumptions about the memory layout of an object.

If I know the total size of the object, is it permissible to create a pointer to that memory and call functions on it?

eg. let's say I have the following class:

[int,int,int,int,char,padding*3bytes,unsigned short int*]

      

1) if I know that this class is of size 24 and I know the address where it starts in memory while it is unsafe to consider the memory layout acceptable to pass that to the pointer and function calls on this object that access those members? (Does C ++ know by some magic the correct member position?)

2) If this is unsafe / ok, is there any other way other than using a constructor that takes all arguments and pulls each argument from the buffer one at a time?

Edit: Changed the title to make it more appropriate for what I'm asking.

+1


source to share


8 answers


You can create a constructor that takes all members and assigns them and then uses the new placement.

class Foo
{
    int a;int b;int c;int d;char e;unsigned short int*f;
public:
    Foo(int A,int B,int C,int D,char E,unsigned short int*F) : a(A), b(B), c(C), d(D), e(E), f(F) {}
};

...
char *buf  = new char[sizeof(Foo)];   //pre-allocated buffer
Foo *f = new (buf) Foo(a,b,c,d,e,f);

      

This has the advantage that even the v-table will be generated correctly. Note, however, if you use this for serialization, the shortigned int unsigned pointer will not point to anything useful when you deserialize it, unless you are very careful to use some method to convert pointers to offsets, then back.

Individual pointer methods are this

statically linked and are simply a direct function call with this

being the first parameter before the explicit parameters.



Member elements are referenced using an offset from the pointer this

. If the object is laid out like this:

0: vtable
4: a
8: b
12: c
etc...

      

a

will be available by dereferencing this + 4 bytes

.

+6


source


Non-virtual function calls are linked directly, like a C function. An object pointer (this) is passed as the first argument. No knowledge of the object layout is required to call the function.



+3


source


Basically what you suggest doing is reading into a bunch of (hopefully not random) bytes, pointing them to a known object, and then calling a class method on that object. This might work because those bytes end with the "this" pointer in this class method. But you are taking a real chance for something that is not compiled code. And unlike Java or C #, there is no real "runtime" to catch these kinds of problems, so at best you get a core dump and at worst you get corrupted memory.

It looks like you want the C ++ version of Java serialization / deserialization. There is probably a library for that.

+3


source


It looks like you are not storing the objects themselves in the buffer, but rather the data they are made of.

If this data is in memory in the order in which the fields are defined in your class (with the correct padding for the platform) and , your type is POD , you can memcpy

buffer the data to a pointer to your type (or perhaps cast it, but be careful. there are some platform-specific hotchs with throws to different types of pointers).

If your class is not a POD, then the in-memory layout of the fields is not guaranteed and you should not rely on any observable ordering, as this is allowed to change on every recompilation.

You can, however, initialize a non-POD with data from the POD.

Regarding addresses where non-virtual functions are located: they are statically linked at compile time somewhere in your code segment, which is the same for every instance of your type. Note that the "runtime" is not involved. When you write code like this:

class Foo{
   int a;
   int b;

public:
   void DoSomething(int x);
};

void Foo::DoSomething(int x){a = x * 2; b = x + a;}

int main(){
    Foo f;
    f.DoSomething(42);
    return 0;
}

      

the compiler generates code that does something like this:

  • Function main

    :
    • allocate 8 bytes on the stack for object " f

      "
    • call the default initializer for the class " Foo

      " (does nothing in this case)
    • the value of the push argument onto 42

      the stack
    • push the pointer to the object " f

      " on the stack
    • make a function call Foo_i_DoSomething@4

      (the actual name is usually more complex)
    • load return value 0

      to battery register
    • return to caller
  • Function Foo_i_DoSomething@4

    (located elsewhere in the code segment)
    • load " x

      " a value from the stack (push to the caller)
    • multiply by 2
    • load " this

      " the pointer from the stack (pushed by the caller)
    • calculate field offset " a

      " inside the objectFoo

    • add the computed offset to the pointer this

      loaded in step 3
    • save the product calculated in step 2 for the offset calculated in step 5
    • load the x

      value from the stack, again
    • load the this

      pointer from the stack, again
    • calculate the field offset " a

      " inside the object Foo

      , again
    • add the computed offset to the pointer this

      loaded in step 8
    • load " a

      " the value saved with the offset,
    • add the value " a

      " loaded by int step 12 to the value " x

      " loaded in step 7
    • load the this

      pointer from the stack, again
    • calculate field offset " b

      " inside the objectFoo

    • add the computed offset to the pointer this

      loaded in step 14
    • store the sum calculated in step 13 for the offset calculated in step 16
    • return to caller
In other words, it would be more or less the same code as if you had written it (specifics such as the name of the DoSomething function and the method of passing the pointer this

to the compiler):
class Foo{
    int a;
    int b;

    friend void Foo_DoSomething(Foo *f, int x);
};

void Foo_DoSomething(Foo *f, int x){
    f->a = x * 2;
    f->b = x + f->a;
}

int main(){
    Foo f;
    Foo_DoSomething(&f, 42);
    return 0;
}

      

+2


source


  • An object of type POD, in this case, has already been created (whether you call new or not). Allocating the required storage is already enough) and you can access its members, including calling a function on that object. But this will only work if you know exactly the required alignment of T and the size of T (the buffer can be at least this) and the alignment of all T members. Even for the pod type, the compiler is allowed to put spaces between the members if it wants to. For non-POD types, you may have the same luck if your type has no virtual functions or base classes, no specific constructor (of course), and this also applies to the base and all of its non-static members.

  • For all other types, all bets are disabled. You must first read the values ​​using a POD and then initialize the non-POD type with that data.

+2


source


I store objects in a buffer .... If I know the total size of an object, is it acceptable to create a pointer to that memory and call functions on it?

This is acceptable to the extent that the use of drops is acceptable:

#include <iostream>

namespace {
    class A {
        int i;
        int j;
    public:
        int value()
        {
            return i + j;
        }
    };
}

int main()
{
    char buffer[] = { 1, 2 };
    std::cout << reinterpret_cast<A*>(buffer)->value() << '\n';
}

      

Passing an object to and from something like raw memory is actually quite common, especially in the C world. However, if you are using a class hierarchy, it would make more sense to use a pointer to member functions.

Let's say I have the following class: ...

if I know this class is 24 in size and I know the address where it starts in memory ...

This is where things get tricky. The size of an object includes the size of its data items (and any data members from any base classes) plus any padding plus any function pointers or implementation-dependent information, minus anything saved when optimizing for a specific size (empty base class optimization) ... If the resulting number is 0 bytes, then the object must accept at least one byte in memory. These things are a combination of language issues and general requirements that most processors have in relation to memory access. Trying to make things work properly can be a real pain .

If you just allocate an object and pronounce it in and out of raw memory, you can ignore these issues. But if you copy the internal objects to some kind of buffer, then they quickly cover the head. The above code relies on a few general alignment rules (i.e., I know that class A will have the same alignment constraints as ints, and thus the array can be safely referenced to A, but I could not guarantee that same if I were doing parts of array A and parts to other classes with other data members).

Oh, and when copying objects you need to make sure you handle pointers correctly.

You may also be interested in things like Google Protocol Buffers or Facebook Thrift .


Yes, these problems are complex. And, yes, some programming languages ​​sweep them under the rug. But there's an awful lot of things covered under the carpet :

In Sun HotSpot JVM, object storage is aligned to the nearest 64-bit boundary. In addition, each object has a 2-word header in memory. The JVM word size is usually the size of the embedded platform pointer. (An object consisting of only 32-bit int and 64-bit binary - 96 data bits) would require two words for the object header, one word for an int, two words for a double. That's 5 words: 160 bits. Due to alignment, this object will occupy 192 bits of memory.

This is because Sun relies on a relatively simple tactic for memory alignment problems (on an imaginary processor, char could be allowed to exist anywhere in memory, int anywhere that is divisible by 4, and double might only need to be allocated on memory locations which is divisible by 32, but the strictest alignment requirement also satisfies every other alignment requirement, so Sun aligns everything according to the most restrictive location).

Another memory alignment tactic can reclaim some of that space .

+2


source


  • If the class does not contain virtual functions (and therefore the instance instances do not have a vptr), and if you make the right assumptions about how the item element data is put into memory, then doing what you suggest might work (but might not be portable ).
  • Yes, another way (more idiomatic, but not much safer ... you still need to know how the class exposes its data) would be to use the so-called "new placement operator" and the default constructor.
+1


source


It depends on what you mean by "secure". Every time you put a memory address at a point this way, you bypass the type safety features provided by the compiler and take responsibility. If, as Chris says, you make the wrong assumption about memory layout or compiler implementation details, then you will get unexpected results and free portability.

Since you are concerned about the "safety" of this programming style, it is probably worth taking the time to research portable and safe types, such as pre-existing libraries, or to write a constructor or assignment operator for this purpose.

0


source







All Articles