C ++ pointer strange undefined behavior

Compiling with -O2 (or -O3 for that matter) and running this program gives interesting results on my machine.

#include <iostream>

using namespace std;

int main()
{
    // Pointer to an int in the heap with a value of 5
    int *p = new int(5);
    // Deallocate the memory, but keep a dangling pointer
    delete p;
    // Write 123 to deallocated space
    *p = 123;
    // Allocate a long int in the heap
    long *x = new long(456);

    // Print values and pointers
    cout << "*p: " << *p << endl;
    cout << "*x: " << *x << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    cout << endl << "Changing nothing" << endl << endl;

    // Print again without changing anything
    cout << "*p: " << *p << endl;
    cout << "*x: " << *x << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    return 0;
}

      

g ++ -O2 code.cc; ./ a.out

*p: 123
*x: 456
p:  0x112f010
x:  0x112f010

Changing nothing

*p: 456
*x: 456
p:  0x112f010
x:  0x112f010

      

What I am doing is writing the deallocated int

to the heap it points p

to and then allocating the long address x

. My compiler consistently puts the long address to the same address as p

x == p

. Now when I play p

and print it, it retains the value 123 although it was rewritten with a long 456.then *x

it prints as 456. What's even weirder is that later, without changing anything, printing the same values ​​gives Expected results. I thought it was an optimization technique that only initializes * x when used after the value is printed *p

, which explains it. However, objdump says something else. Here is truncated and commented objdump -d a.out

:

00000000004008a0 <main>:
  4008a0:   41 54                   push   %r12
  4008a2:   55                      push   %rbp

Most likely the int allocation, where 0x4 is the size (4 bytes)
  4008a3:   bf 04 00 00 00          mov    $0x4,%edi
  4008a8:   53                      push   %rbx
  4008a9:   e8 e2 ff ff ff          callq  400890 <_Znwm@plt>

I have no idea what is going on here, but the pointer p is in 2 registers. Let call the other one q.
q = p;
  4008ae:   48 89 c3                mov    %rax,%rbx

  4008b1:   48 89 c7                mov    %rax,%rdi

*p = 5;
  4008b4:   c7 00 05 00 00 00       movl   $0x5,(%rax)

delete p;
  4008ba:   e8 51 ff ff ff          callq  400810 <_ZdlPv@plt>

*q = 123;
  4008bf:   c7 03 7b 00 00 00       movl   $0x7b,(%rbx)

The long allocation and some other stuff (?). (8 bytes)
  4008c5:   bf 08 00 00 00          mov    $0x8,%edi
  4008ca:   e8 c1 ff ff ff          callq  400890 <_Znwm@plt>
  4008cf:   44 8b 23                mov    (%rbx),%r12d
  4008d2:   be e4 0b 40 00          mov    $0x400be4,%esi
  4008d7:   bf c0 12 60 00          mov    $0x6012c0,%edi

Initialization of the long before the printing
*p = 456;
  4008dc:   48 c7 00 c8 01 00 00    movq   $0x1c8,(%rax)

  4008e3:   48 89 c5                mov    %rax,%rbp

The printing
  4008e6:   e8 85 ff ff ff          callq  400870 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
........

      

Now, although it *p

was overwritten by initialization long

( 4008dc

), it still prints as 123.

Hope I make sense and thanks for any help.

to make it clear: I'm trying to understand what's going on behind the scenes, what the compiler is doing, and why the resulting compiled code doesn't match the output. I KNOW THIS UNDEFINED BEHAVIOR AND THAT ANYTHING COULD HAPPEN. But this means that the compiler can generate any code, not that the CPU will compose instructions. Any ideas are appreciated.

PS: Don't worry, I don't plan on using this anywhere;)

EDIT: On my machine for a friend (OS X) it gives the expected results even when optimized.

+3


source to share


3 answers


You stopped looking at your output too quickly (or at least you haven't posted the next few lines that are relevant to your question). They probably look something like this:

movl    %r12d, %esi
movq    %rax, %rdi
call    _ZNSolsEi
movq    %rax, %rdi
call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_

      

rbx

and r12

- registers to be kept for function calls in the x64 ABI used by GCC on Linux. Once highlighted, long

you will see this instruction:

mov    (%rbx),%r12d

      

Uses rbx

earlier in the command flow include:



mov    %rax,%rbx       ; store the `p` pointer in `rbx`

...

movl   $0x7b,(%rbx)    ; store 123 where `p` pointed (even though it has been freed before)

... 

mov    (%rbx),%r12d    ; read that value - 123 - back and into `r12`

      

then you see in the snippet posted above, which is a disassembly that didn't make it into your question and matches part of the instruction cout << "*p: " << *p << endl

:

movl    %r12d, %esi    ; put 123 into `esi`, which is used to pass an argument to a function call

      

And it 123

is printed.

+4


source


As you mentioned, this could be due to optimizations done by the compiler. If you compile with -O0 then it will print 456 for values. Since p was removed and x was immediately allocated, x will point to the same address p was pointing to (maybe this is not the same case always, but in your tests it is most likely). Hence, * p and * x must de-reference the same value. If you change the print order then 456 for values ​​will always be printed. I have changed the order of the first two cout statements in your code as shown below:



#include <iostream>

using namespace std;

int main()
{
    // Pointer to an int in the heap with a value of 5
    int *p = new int(5);
    // Deallocate the memory, but keep a dangling pointer
    delete p;
    // Write 123 to deallocated space
    *p = 123;
    // Allocate a long int in the heap
    long *x = new long(456);

    // Print values and pointers
    cout << "*x: " << *x << endl;
    cout << "*p: " << *p << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    cout << endl << "Changing nothing" << endl << endl;

    // Print again without changing anything
    cout << "*p: " << *p << endl;
    cout << "*x: " << *x << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    return 0;
}

      

+1


source


You won't find the answer in your own source code or what the compiler does, even if you created an assembly from the compiler.

undefined -ness happens to the memory allocation of the C-runtime, which is already compiled binary associated with your test application. When you call new, the runtime library decides where the pointer goes. There is no guarantee that new / delete / new will mean that the second new one gives you the same address, entirely implementation dependent.

If you REALLY want to know, then you need to build the complete source code, including the source code for the new one, and then read how it is implemented and / or step through it in the debugger to see what is going on.

+1


source







All Articles