When is undefined behavior well known and accepted?

We know this behavior is undefined, and we (more or less) know the reasons (performance, cross-platform compatibility) for most of them. Assuming a given platform is, say, 32-bit Windows, can we treat undefined behavior as well known and consistent across the platform? I realize there is no generic answer , then I would limit two generic UBs that I see quite often in production code (used since years).

1) Link . Give this one union

:

union {
    int value;
    unsigned char bytes[sizeof(int)];
} test;

      

Initialized as follows:

test.value = 0x12345678;

      

Then opens with:

for (int i=0; i < sizeof(test.bytes); ++i)
    printf("%d\n", test.bytes[i]);

      

2) Link . Given an unsigned short*

casting array (for example) float*

and accessing it ( link , no padding between array elements).

Is the code based on well-known UBs (like the ones that work) depending on the case (assuming the compiler might change and most likely the compiler version will change for sure) or even if it is UB for cross-platform code they rely on specific platform details (then it won't change unless we change the platform)? Does the same reasoning apply also to undefined behavior (when the compiler documentation doesn't say anything about it)?

EDIT As per this post , since C99, pinning is simply not specified, not undefined.

+3


source to share


2 answers


First of all, any compiler implementation can define any behavior it likes, in any situation that, from a standard perspective, creates an Undefined Behavior.

Second, code written for a particular compiler implementation is free to use any behavior that is documented by that implementation; code that does this, however, may not be used in other implementations.

One of the longstanding drawbacks of C is that, while there are many situations where constructs that might be handled by useful others for some implementations, only a small minority of such situations provide any ways in which the code can indicate that the compiler, which will not handle them in a specific way should refuse to compile. In addition, there are many cases where the Standards Committee allows full blown UB, although in most implementations the "natural" impact will be much more limited. Consider for example (assume int

- 32 bits)

int weird(uint16_t x, int64_t y, int64_t z)
{
  int r=0;
  if (y > 0) return 1;
  if (z < 0x80000000L) return 2;
  if (x > 50000) r |= 31;
  if (x*x > z) r |= 8;
  if (x*x < y) r |= 16;
  return r;
}

      

If the above code was run on a machine that simply ignores the integer overflow, the transfer 50001,0,0x80000000L

should result in code 31 being returned; passing 50000,0,0x80000000L

can result in 0, 8, 16, or 24 being returned depending on how the code handles the comparison operations. However, the C standard would allow code to do anything in either of these cases; because of this, some compilers may determine that none of the operators if

outside the first two can be true in any situation that did not call the Undefined Behavior, and therefore can assume that r

it is always zero. Note that one of the findings will affect the behavior of the statement preceding Undefined Behavior.



One thing I'd really like to see is the concept of implementation-constrained behavior, which would be something like between Undefined Behavior and implementation-specific behavior: compilers had to document all the possible consequences of certain constructs, which by the old rules would be Undefined Behavior, but - unlike implementation-defined behavior - no implementation would be required to implement a specific situation; implementations will be allowed to indicate that a certain construct can have arbitrary unconditional consequences (full UB), but this will not be discouraged by that. In the case of something like integer overflow, a reasonable compromise would be to say that the result of an expression that overflows could be a "magic" value, which, if explicitly expressed,will result in an arbitrary (and "normal") value of the specified type, but may otherwise have arbitrarily changing values ​​that may or may not be represented. Compilers will be allowed to assume that the result of an operation will not be the result of overflow, but will refrain from inference about the operands. To use a vague analogy, the behavior would be similar to that of floating point if explicitly the typecastthe behavior would be similar to that of floating point if explicitly castbehavior would be similar to that of floating point if explicitly castNaN

can lead to any arbitrary result other than NaN.

IMHO C would be very useful to combine the concept of implementation-constrained behavior described above with some standard predefined macros that would allow code to check if an implementation makes a particular promises its behavior in various situations. In addition, it would be helpful if there was a standard means by which a piece of code could query for specific "dialects" [a combination of size int

, implementation-constrained behavior, etc.]. One could write a compiler for any platform that could, upon request, have promotion rules as if int

there were exactly 32 bits. For example, given a code like:

uint64_t l1,l2; uint32_t w1,w2; uint16_t h1,h2;
...
l1+=(h1+h2);
l2+=(w2-w1);

      

A 16-bit compiler might be fastest if it did the math on h1

and h2

using 16 bits, and a 64-bit compiler might be fastest if it added to the l2

64-bit result by subtracting w1

from w2

, but if the code was written for 32 -bit system, then being able to have compilers for the other two would generate code that would behave as it did on a 32-bit system, it would be more useful than generating them code that performed several different calculations, no matter how the last code will be faster.

Unfortunately, there is currently no standard means by which the code can request such semantics [a fact that in many cases will limit the efficiency of 64-bit code]; The best thing to do is probably to explicitly document the environmental requirements of the code and hope that whoever uses the code sees them.

+2


source


Undefined behavior means first and foremost a very simple thing, the behavior of the code in question is undefined, so the C standard gives no idea of ​​what might happen. Look no more than in it.

If the C standard does not define something, your platform can do it as an extension. Therefore, if you are in such a case, you can use it on this platform. But then make sure they document this extension and that they don't change it in the next version of your compiler.



Your examples are wrong for several reasons. As discussed in the comments, these union

are created to customize the type and, in particular, memory access, since any type of character is always allowed. Your second example is really bad because, apart from what you seem to be implying, this is not acceptable on any platform I know. short

and float

usually have different alignment properties, and using such a thing will almost certainly crash your program. Then, thirdly, you argue about C on Windows, which is known for not following the C standard.

+3


source







All Articles