CCC C vector expansion: how to check the result of comparison (for conditional assignment, etc.)?
Background: GCC C's builtin vector extensions allow for a fairly natural representation of SIMD vectors as C types. "According to the documentation, many builtin operations (+, , etc.) are supported. However, the ternary operator as well as logical operators (&, ) only work in C ++ for some reason. This is a problem for the all = C codebase.
Question: In GCC C, how to implement SIMD compliant [branching] conditionals of the form:
v4si a = {2,1,3,4}, b, indicesLessThan0;
indicesLessThan0 = a < 0;
b = indicesLessThan0 ? a : 0;
And, more generally, how to execute an arbitrary independent block of statements based on the same result:
v4si c = {9,8,7,6}, d;
for (int i = 0; i < 4; i++) {
if (indicesLessThan0[i]) { // consider tests one by one
b[i] = a[i] // as the ternary operator does above
d[i] = c[i] + 1; // some other independent operation
}
else {
b[i] = 0; // as the ternary operator does above
d[i] = c[i]  1; // another independent operation
}
}
If you make the statement block harder (SIMD forking is bad), it might be nice to run the ternary test again for any additional statements costing (presumably) some efficiency:
d = indicesLessThan0 ? c + 1 : c  1; // the other operation in the loop
But the ternary operator doesn't work in C for some reason the manual doesn't explain. Is there another easy way? Some way to use if statements?
source to share
I found 3 solutions as a result of hitting the kitchen sink code.

Switch to g ++. Not too complicated, and it turns out that most of the code can be reversed by simply placing (type *) in front of all allocs. Then I can just do:
v16s8 condStor = test ? a : b;

Better yet, I found that you can just use bitbash using different mixes of and and , just like everyone else does with bits inside integers. The trick is that vectors set the whole truth to 11111111 ... (unsigned 1), which forces the values ββto stick when using bitwise operators.
 Better yet, "type punning 101" with an internal function:
v16s8 condStor = b; __builtin_ia32_maskmovdqu (a, test, (char *)(&condStor));
This uses a function designed to do what # 2 does in one fell swoop.
Not sure? Check your build:

pxor %xmm1, %xmm1 movdqa 64(%rbp), %xmm0 pcmpeqb %xmm1, %xmm0 pcmpeqd %xmm1, %xmm1 pandn %xmm1, %xmm0 pxor %xmm1, %xmm1 pcmpgtb %xmm0, %xmm1 movdqa %xmm1, %xmm0 movdqa 32(%rbp), %xmm2 movdqa 16(%rbp), %xmm1 pand %xmm0, %xmm1 pandn %xmm2, %xmm0 por %xmm1, %xmm0 movaps %xmm0, 80(%rbp)

movdqa 64(%rbp), %xmm0 movdqa %xmm0, %xmm1 pand 16(%rbp), %xmm1 pcmpeqd %xmm0, %xmm0 pxor 64(%rbp), %xmm0 pand 32(%rbp), %xmm0 por %xmm1, %xmm0 movaps %xmm0, 80(%rbp)

movdqa 32(%rbp), %xmm0 movaps %xmm0, 80(%rbp) leaq 80(%rbp), %rax movdqa 16(%rbp), %xmm0 movdqa 64(%rbp), %xmm1 movq %rax, %rdi maskmovdqu %xmm1, %xmm0
Judging by how 1 folded and then 2 and then 3, I can now see the cost of the C ++ abstraction. Perhaps this is what Linus was talking about that day. (No, probably not.) Anyway, hope this helps someone!
source to share