Getting GCC / Clang to use CMOV
I have a simple labeled union of values. Values can be int64_ts
or doubles
. I am doing the addition in these unions with the caveat that if both arguments represent values int64_t
, then the result must also matter int64_t
.
Here is the code:
#include<stdint.h>
union Value {
int64_t a;
double b;
};
enum Type { DOUBLE, LONG };
// Value + type.
struct TaggedValue {
Type type;
Value value;
};
void add(const TaggedValue& arg1, const TaggedValue& arg2, TaggedValue* out) {
const Type type1 = arg1.type;
const Type type2 = arg2.type;
// If both args are longs then write a long to the output.
if (type1 == LONG && type2 == LONG) {
out->value.a = arg1.value.a + arg2.value.a;
out->type = LONG;
} else {
// Convert argument to a double and add it.
double op1 = type1 == LONG ? (double)arg1.value.a : arg1.value.b; // Why isn't CMOV used?
double op2 = type2 == LONG ? (double)arg2.value.a : arg2.value.b; // Why isn't CMOV used?
out->value.b = op1 + op2;
out->type = DOUBLE;
}
}
The gcc output on -O2 is here: http://goo.gl/uTve18 Attached here if the link doesn't work.
add(TaggedValue const&, TaggedValue const&, TaggedValue*):
cmp DWORD PTR [rdi], 1
sete al
cmp DWORD PTR [rsi], 1
sete cl
je .L17
test al, al
jne .L18
.L4:
test cl, cl
movsd xmm1, QWORD PTR [rdi+8]
jne .L19
.L6:
movsd xmm0, QWORD PTR [rsi+8]
mov DWORD PTR [rdx], 0
addsd xmm0, xmm1
movsd QWORD PTR [rdx+8], xmm0
ret
.L17:
test al, al
je .L4
mov rax, QWORD PTR [rdi+8]
add rax, QWORD PTR [rsi+8]
mov DWORD PTR [rdx], 1
mov QWORD PTR [rdx+8], rax
ret
.L18:
cvtsi2sd xmm1, QWORD PTR [rdi+8]
jmp .L6
.L19:
cvtsi2sd xmm0, QWORD PTR [rsi+8]
addsd xmm0, xmm1
mov DWORD PTR [rdx], 0
movsd QWORD PTR [rdx+8], xmm0
ret
He created code with a lot of branches. I know the input is pretty random, meaning it has a random combination int64_t
and double
s. I would like to have at least a conversion to double done with the equivalent of an instruction CMOV
. Is there a way to persuade gcc to generate this code? Ideally, I would like to run some real data benchmark to see how code with a lot of branches works against one with fewer branches but more expensive instructions CMOV
. It may turn out that the default code generated by GCC works better, but I would like to confirm this. I could inline the assembly myself, but I would rather not.
An interactive compiler link is a good way to test an assembly. Any suggestions?
source to share
No one has answered this question yet
Check out similar questions: