Can't understand simple c code output about function call in linux

I am writing simple code when I am trying to understand a function call. But I can't figure out his way out.

#include <stdio.h>

int* foo(int n)
{
    int *p = &n;
    return p;
}

int f(int m)
{
    int n = 1;
    return 999;
}

int main(int argc, char *argv[])
{
    int num = 1;
    int *p = foo(num);
    int q = f(999);
    printf("[%d]\n[%d]\n", *p, q);
    /* printf("[%d]\n", *q); */
}

      

Output:

[999]
[999]

      

Why *p

999?

Then I changed my code like this:

#include <stdio.h>

int* foo(int n)
{
    int *p = &n;
    return p;
}

int f()
{
    int n = 1;
    return 999;
}

int main(int argc, char *argv[])
{
    int num = 1;
    int *p = foo(num);
    int q = f();
    printf("[%d]\n[%d]\n", *p, q);
    /* printf("[%d]\n", *q); */
}

      

Output:

[1]
[999]

      

Why is *p

there 1? I'm on Linux using gcc but Clang got the same result.

+3


source to share


6 answers


Apart from the fact that your code is chasing undefined behavior because you are returning a pointer to a stack variable, you were asking why the behavior changes when the signature of f () changes.

Reason for which

The reason is in the way the compiler creates the stack for the functions. Suppose the compiler creates a stack frame like this foo ():

Address Contents  
0x199   local variable p
0x200   Saved register A that gets overwritten in this function
0x201   parameter n
0x202   return value
0x203   return address

      

And for f (int m) the stack looks quiet:

Address Contents  
0x199   local variable n
0x200   Saved register A that gets overwritten in this function
0x201   parameter m
0x202   return value
0x203   return address

      

Now, what happens if you return a pointer to 'n' in foo? The resulting pointer will be 0x201. After foo returns, the top of the stack is at 0x204. The memory remains unchanged and you can still read the value "1". This works until another function is called (in your case "f"). After f is called, location 0x201 is overwritten with the value of m.

If you access that location (and you do so with a printf statement), it reads "999". If you copied the value of this place before calling f (), you would find the value "1".

Sticking with our example, the stack frame for f () will look like there are no parameters specified:

Address Contents  
0x200   local variable n
0x201   Saved register A that gets overwritten in this function
0x202   return value
0x203   return address

      



When you initialize a local variable with "1", you can read "1" at location 0x200 after calling f (). If you now read the value from location 0x201, you will get the contents of the stored register.

Some additional statements

  • It is imperative to understand that the above explanation is to show you the methodology for why you observe what you observe.
  • The actual behavior depends on the toolchain used and the so-called provocative conventions.
  • It is easy to imagine that it is sometimes difficult to predict what will happen. This is a quiet analogous situation with accessing memory after freeing it. This is why it is generally unpredictable what happens.
  • This behavior may even change as the optimization level changes. For example. I can imagine that if you include -O3, for example, the observation will be different because the unused variable n no longer appears in binary.
  • Once you understand the mechanisms, you should understand why writing access to the address retrieved from foo can lead to serious problems.

For the brave attempts to prove this explanation through experimentation

First of all, it is important to see that the above explanation is independent of the actual layout of the stack frames. I just presented the layout to have an illustration that is easy to understand.

If you want to test the behavior on your own computer, I suggest that you take your favorite debugger and look at the addresses where local variables and parameters are placed to see what actually happens. Note: changing the signature of f changes the information pushed onto the stack. Thus, the only real "portable" test changes the parameter for f () and monitors the output for the p-point value.

In the case of a call to f (void), the information pushed onto the stack differs significantly, and the value specified at position p indicates optional, more dependent on parameters or locals. It can also depend on stack variables from the main function.

On my machine, for example, replay showed that the "1" you read in the second option comes from storing the register that was used to store "1" to "num" as it appears to be used to load n ...

Hope this gives you some insight. Leave a comment if you have further questions. (I know it's a little weird to understand)

+4


source


You are calling undefined behavior. You cannot return the address of a local variable (in this case an argument int n

) and expect it to be useful later.



+2


source


Local variable, for example n

in your code:

int* foo(int n)
{
    int *p = &n;
    return p;
}

      

"Disappears" as soon as the function ends foo

.

You cannot use it, because accessing this variable can give you unpredictable results. You can write something like this:

int* foo(int* n)
{
    *n = 999;
    return p;
}

int main(int argc, char *argv[])
{
    int num = 1;
    int *p = foo(&num);
    printf("[%d]\n", *p);
}

      

because your variable num

still exists at the print point. In the meantime, there is no need to know about it. ”

+2


source


In your first example, when you do

int num = 1;
int *p = foo(num);

      

where foo()

-

int* foo(int n)
{
    int *p = &n;
    return p;
}

      

When a variable is passed num

from main()

, it is passed by value foo

. In other words, a copy of the variable num

called n

is created on the stack. Both num

and n

have the same meaning, but they are different variables and therefore will have different addresses.

When you return p

from foo()

, main()

gets the value of an address other than the num

delared address inmain()

The same explanation applies to your modified program.

Take a look at another example to clarify:

int i = 2;

int * foo()
{
return &i;
}

int main() {

i = 1;
int *p = foo();
return 0;

}

      

In this case the i

declared in the heap, and the same i

applies to main()

and to foo()

. Same address and same value.

Let's look at the third example:

int i = 2;

int * foo(int i)
{
return &i;
}

int main() {

int i = 1;
int *p = foo(i);
return 0;

}

      

Here, although a global exists i

, it is hidden by a local variable i

in main()

, and that's what is being passed to foo()

. So, &i

returned from foo

, that is, the value p

in main()

, will be different from the address of the variable i declared in main ().

Hopefully this clarifies the scope of the variable and conveys its value,

0


source


It's not easy without an assembler output, but this is my guess:

Locales and parameters are pushed onto the stack. Therefore, when called, foo

it returns the address of the first parameter, which is on the stack.

In the first example, you are passing a parameter to the second function, which will also be pushed onto the stack where exactly it points p

. Therefore, it overwrites the value *p

.

In the second example, the stack is not touched in the second call. The old value ( num

) remains there.

0


source


This Undefined behavior is due to stack involvement

int *p = foo(num);
int q = f(999);

      

In the first case, when you say it &num

, it actually stores the address on the stack where it was stored num

. Then foo (num) exits and f (999) takes over with parameter 999. Since it uses the same stack, there is now parameter 999 in the same place on the stack where num was stored. And we know that the stack adjacent.

This is the reason for printing 999

. In fact, both are trying to print the contents of the same place on the stack.

Whereas in the second case, num is not overwritten, since no parameter is passed to f () So this prints as expected.

0


source







All Articles