CCC C vector expansion: how to check the result of comparison (for conditional assignment, etc.)?
Background: GCC C's built-in vector extensions allow for a fairly natural representation of SIMD vectors as C types. "According to the documentation, many built-in operations (+, -, etc.) are supported. However, the ternary operator as well as logical operators (&, ||) only work in C ++ for some reason. This is a problem for the all = C codebase.
Question: In GCC C, how to implement SIMD compliant [branching] conditionals of the form:
v4si a = {2,-1,3,4}, b, indicesLessThan0;
indicesLessThan0 = a < 0;
b = indicesLessThan0 ? a : 0;
And, more generally, how to execute an arbitrary independent block of statements based on the same result:
v4si c = {9,8,7,6}, d;
for (int i = 0; i < 4; i++) {
if (indicesLessThan0[i]) { // consider tests one by one
b[i] = a[i] // as the ternary operator does above
d[i] = c[i] + 1; // some other independent operation
}
else {
b[i] = 0; // as the ternary operator does above
d[i] = c[i] - 1; // another independent operation
}
}
If you make the statement block harder (SIMD forking is bad), it might be nice to run the ternary test again for any additional statements costing (presumably) some efficiency:
d = indicesLessThan0 ? c + 1 : c - 1; // the other operation in the loop
But the ternary operator doesn't work in C for some reason the manual doesn't explain. Is there another easy way? Some way to use if statements?
source to share
I found 3 solutions as a result of hitting the kitchen sink code.
-
Switch to g ++. Not too complicated, and it turns out that most of the code can be reversed by simply placing (type *) in front of all -allocs. Then I can just do:
v16s8 condStor = test ? a : b;
-
Better yet, I found that you can just use bitbash using different mixes of and and |, just like everyone else does with bits inside integers. The trick is that vectors set the whole truth to 11111111 ... (unsigned -1), which forces the values ββto stick when using bitwise operators.
- Better yet, "type punning 101" with an internal function:
v16s8 condStor = b; __builtin_ia32_maskmovdqu (a, test, (char *)(&condStor));
This uses a function designed to do what # 2 does in one fell swoop.
Not sure? Check your build:
-
pxor %xmm1, %xmm1 movdqa -64(%rbp), %xmm0 pcmpeqb %xmm1, %xmm0 pcmpeqd %xmm1, %xmm1 pandn %xmm1, %xmm0 pxor %xmm1, %xmm1 pcmpgtb %xmm0, %xmm1 movdqa %xmm1, %xmm0 movdqa -32(%rbp), %xmm2 movdqa -16(%rbp), %xmm1 pand %xmm0, %xmm1 pandn %xmm2, %xmm0 por %xmm1, %xmm0 movaps %xmm0, -80(%rbp)
-
movdqa -64(%rbp), %xmm0 movdqa %xmm0, %xmm1 pand -16(%rbp), %xmm1 pcmpeqd %xmm0, %xmm0 pxor -64(%rbp), %xmm0 pand -32(%rbp), %xmm0 por %xmm1, %xmm0 movaps %xmm0, -80(%rbp)
-
movdqa -32(%rbp), %xmm0 movaps %xmm0, -80(%rbp) leaq -80(%rbp), %rax movdqa -16(%rbp), %xmm0 movdqa -64(%rbp), %xmm1 movq %rax, %rdi maskmovdqu %xmm1, %xmm0
Judging by how 1 folded and then 2 and then 3, I can now see the cost of the C ++ abstraction. Perhaps this is what Linus was talking about that day. (No, probably not.) Anyway, hope this helps someone!
source to share