Get the value of a comparison statement

As I understand it, the cmp command sets some of the bits in your flags register. Then you can use instructions like jle, jnp, etc. to branch them.

I am wondering how you can recover an integer value from a comparison.

Example: valid syntax c

y = x[a >= 13];

      

So a is compared with 13 to get a true or false value, which is interpreted as 1 or 0, respectively. However, that 1 or 0 must be fed into the array access as an integer. What will the compiler do?

Some things I can think of are the following:

Make a comparison and then go to x [0] or x [1]

Do the comparison and then answer to do either tmp = 0 or tmp = 1 and then x [tmp]

Maybe some fancy logic on the flags (not sure if there are instructions for accessing the flags directly)

I tried to see what gcc is spitting out for this code example, but there is no way to pick the logic out of all the extra garbage it throws in.

I am working on a compiler, so any suggestions would be appreciated.

+3


source to share


1 answer


There are three ways you can do this. I will go through them in turn.

One way to do this is basically what you describe in the question: do a comparison and then answer with code that separately implements both possibilities. For example:

    cmp  [a], 13                     ; compare 'a' to 13, setting flags like subtraction
    jge  GreaterThanOrEqual          ; jump if 'a' >= 13, otherwise fall through

    mov  eax, [x * 0 * sizeof(x[0])] ; EAX = x[0]
    jmp  Next                        ; EAX now loaded with value, so do unconditional jump

GreaterThanOrEqual:
    mov  eax, [x * 1 * sizeof(x[0])] ; EAX = x[1]
                                     ; EAX now loaded with value; fall through

Next:
    mov  [y], eax                    ; store value of EAX in 'y'

      

Usually the compiler will try to store more values ​​in registers, but that should give you an idea of ​​the underlying logic. It does the comparison, and either leads to the command that reads / loads x[1]

, or goes through the command that reads / loads x[0]

. It then moves on to the statement that stores that value in y

.

You should be able to see that this is relatively inefficient due to all the branching required. Thus, compiler optimizations will not generate code like this, especially in the simple case where you have a basic ternary expression:

(a >= 13) ? 1 : 0

      

or even:

(a >= 13) ? 125 : -8

      

There are bit tricks one can use for this comparison and get the corresponding integer without having to branch.

This brings us to the second way of doing it, which is to use an instruction SETcc

. Part cc

means "condition code" and all condition codes are the same as for conditional branch instructions. (In fact, you could write all conditional jump statements as Jcc

.) For example, jge

means "jump if greater than or equal"; likewise setge

means "set if greater than or equal". Plain.

The trick about SETcc

is that it establishes a register BYTE size, which basically means AL

, CL

, DL

or BL

(have more options, you can set the high byte of one of these registers, and / or in a mode with 64-bit long numbers have more options, but this is the main choice for operands).

Here is some sample code that implements this strategy:

xor   edx, edx        ; clear EDX
cmp   [a], 13         ; compare 'a' to 13, setting flags like subtraction
setge dl              ; set DL to 1 if greater-than-or-equal, or 0 otherwise
mov   eax, [x * edx * sizeof(x[0])]
mov   [y], eax

      

Cool, huh? The branch has been eliminated. The required 0 or 1 is loaded directly into DL

and then used as part of the load ( MOV

) command .

The only thing that is a little confusing here is that you need to know what DL

is the low byte of a full 32-bit register EDX

. That is why we had to clear the full register beforehand EDX

, since it setge dl

only affected the least significant byte, but we want the full one to EDX

be either 0 or 1. It turns out that pre-zeroing the full register is the most optimal way to do this on all processors , but there are others ways, for example use MOVZX

after command SETcc

. The linked answer goes into detail on this, so I won't redistribute it. The key point is thatSETcc

sets only the low byte of the register, but subsequent instructions require the entire 32-bit register to have a value, so you need to clear the high bytes.

Anyway, this is the code that compilers will generate 99% of the time when you write something like y = x[a >= 13]

. The instruction SETcc

gives you a way to set a byte according to the state of one or more flags, just like you can fork flags. Basically, this was what you thought of an instruction that allows you to access flags directly.

This implements the logic for

(a >= 13) ? 1 : 0

      

but what if you want to do



(a >= 13) ? 125 : -8

      

as I mentioned earlier? Well, you still use the instruction SETcc

, but after that you pretend a little to "fix" the resulting 0 or 1 values ​​that you really want. For example:

xor   edx, edx
cmp   [a], 13
setge dl
dec   edx
and   dl, 123
add   edx, 125
; do whatever with EDX

      

This works for almost any binary choice (two possible values, depending on the state), and the optimizing compilers are smart enough to handle it. Not yet distributed code; very cool.

There is a third way this can be implemented, but it is conceptually very similar to the second way we just talked about. It uses a conditional move statement, which is another way to create an unallocated set based on the status of the flags. Conditional move statement CMOVcc

where cc

again refers to the "condition code" as in the previous examples. The manual CMOVcc

was introduced with the Pentium Pro around 1995 and has been in all processors since then (well, not the Pentium MMX, but the Pentium II and later), so essentially everything you've ever seen today.

The code is very similar, except that it is - as the name suggests - a conditional move, so a little more pre-configuration is required. Specifically, you need to get the candidate values ​​loaded into the registers so you can pick the right one:

xor    edx, edx    ; EDX = 0
mov    eax, 1      ; EAX = 1
cmp    [a], 13     ; compare 'a' to 13 and set flags
cmovge edx, eax    ; EDX = (a >= 13 ? EAX : EDX)
mov    eax, [x * edx * sizeof(x[0])]
mov    [y], eax

      

Note that moving EAX

to EDX

conditional - this only happens if the flags indicate a condition ge

(greater than or equal to). Thus, it works with the main C procedure as indicated in the comment to the right of the statement. If flags denote ge

, then EAX

moves to EDX

. Otherwise, nothing is moved, but EDX

retains its original value.

Note that while some compilers (Intel specific compiler known as ICC) prefer commands CMOV

over instructions SET

, this has no advantage over the previous implementation we saw earlier with setge

. In fact, this is really suboptimal.

When it CMOV

really comes to its senses, it allows you to eliminate that bit of twisting code needed to get values ​​other than the good old 0 or 1. For example:

mov    edx, -8     ; EDX = -8
mov    eax, 125    ; EAX = 125
cmp    [a], 13     ; compare 'a' to 13 and set flags
cmovge edx, eax    ; EDX = (a >= 13 ? EAX : EDX)
; do whatever with EDX

      

This is now less instructions because the correct value has been moved directly into a register EDX

, as opposed to setting 0 or 1, and then we need to manipulate it into the values ​​we want. Therefore, compilers will use instructions CMOV

(when targeting processors that support them, as mentioned) to implement more complex logic, such as

(a >= 13) ? 125 : -8

      

although they could use them using one of the other approaches. You also need conditional moves where the operands on either side of the condition are not compile-time constants (i.e., they are values ​​in registers known only at runtime).

Does this help? :-)

I tried to see what gcc is spitting out for this code example, but there is no way to pick the logic out of all the extra garbage it throws in.

Yes. I have some tips for you:

  • Put your code on a very simple function that only does what you want to learn. You will want to take the input as a parameter (so that the optimizer cannot trivially dump the constant), and you will want to return the result from the function. For example:

    int Foo(int a)
    {
        return a >= 13;
    }
    
          

    Refund a would bool

    also work here. If you were to use the conditional operator to return anything other than 0 or 1, you would need to return int

    , of course.

    Anyway, you can now see exactly what build commands the compiler generates to do this, without any other hindrance. Make sure you enable optimization ; looking at the debug code is not instructive and very noisy.

  • Make sure you are asking GCC to generate assembly lists using the Intel / MASM format, which is much easier to read (at least in my opinion) than its default format, GAS / AT&T syntax.All the above assembly code examples written using Intel syntax. Required spell:

    gcc -S -masm=intel MyFile.c
    
          

    where -S

    generates an assembly list for the original source file, and -masm=intel

    switches the assembly list syntax format to Intel style.

  • Use a good tool like the Godbolt Compiler Explorer that automates all of this to dramatically reduce turn times. As another bonus, it encodes build commands according to the lines of C code in the original source.

    Here's an example of what you will use to learn this . The original source is at the far left. The middle pane shows the modern processor build of GCC 7.1, which supports instructions CMOV

    . The right-hand pane shows a GCC 7.1 build for a very old processor that doesn't support instructions CMOV

    . Cool, isn't it? And you can easily manipulate the compiler switches and watch the result change. For example, if you use -m64

    (64-bit) instead of -m32

    (32-bit), then you will see that this parameter is passed in register ( EDI

    ), not pushed onto the stack, and must be loaded into the register as the very first instruction in the function.

+3


source







All Articles