CCC C vector expansion: how to check the result of comparison (for conditional assignment, etc.)?

Background: GCC C's built-in vector extensions allow for a fairly natural representation of SIMD vectors as C types. "According to the documentation, many built-in operations (+, -, etc.) are supported. However, the ternary operator as well as logical operators (&, ||) only work in C ++ for some reason. This is a problem for the all = C codebase.

Question: In GCC C, how to implement SIMD compliant [branching] conditionals of the form:

    v4si a = {2,-1,3,4}, b, indicesLessThan0;
    indicesLessThan0 = a < 0;
    b = indicesLessThan0 ? a : 0;


And, more generally, how to execute an arbitrary independent block of statements based on the same result:

v4si c = {9,8,7,6}, d;
for (int i = 0; i < 4; i++) {
  if (indicesLessThan0[i]) { // consider tests one by one
     b[i] = a[i] // as the ternary operator does above
     d[i] = c[i] + 1; // some other independent operation
  else {
     b[i] = 0; // as the ternary operator does above
     d[i] = c[i] - 1; // another independent operation


If you make the statement block harder (SIMD forking is bad), it might be nice to run the ternary test again for any additional statements costing (presumably) some efficiency:

d = indicesLessThan0 ? c + 1 : c - 1; // the other operation in the loop


But the ternary operator doesn't work in C for some reason the manual doesn't explain. Is there another easy way? Some way to use if statements?


source to share

1 answer

I found 3 solutions as a result of hitting the kitchen sink code.

  • Switch to g ++. Not too complicated, and it turns out that most of the code can be reversed by simply placing (type *) in front of all -allocs. Then I can just do:

    v16s8 condStor = test ? a : b;

  • Better yet, I found that you can just use bitbash using different mixes of and and |, just like everyone else does with bits inside integers. The trick is that vectors set the whole truth to 11111111 ... (unsigned -1), which forces the values ​​to stick when using bitwise operators.

  • Better yet, "type punning 101" with an internal function:
    v16s8 condStor = b; __builtin_ia32_maskmovdqu (a, test, (char *)(&condStor));

    This uses a function designed to do what # 2 does in one fell swoop.

Not sure? Check your build:

  • pxor    %xmm1, %xmm1
    movdqa  -64(%rbp), %xmm0
    pcmpeqb %xmm1, %xmm0
    pcmpeqd %xmm1, %xmm1
    pandn   %xmm1, %xmm0
    pxor    %xmm1, %xmm1
    pcmpgtb %xmm0, %xmm1
    movdqa  %xmm1, %xmm0
    movdqa  -32(%rbp), %xmm2
    movdqa  -16(%rbp), %xmm1
    pand    %xmm0, %xmm1
    pandn   %xmm2, %xmm0
    por %xmm1, %xmm0
    movaps  %xmm0, -80(%rbp)

  • movdqa  -64(%rbp), %xmm0
    movdqa  %xmm0, %xmm1
    pand    -16(%rbp), %xmm1
    pcmpeqd %xmm0, %xmm0
    pxor    -64(%rbp), %xmm0
    pand    -32(%rbp), %xmm0
    por %xmm1, %xmm0
    movaps  %xmm0, -80(%rbp)

  • movdqa  -32(%rbp), %xmm0
    movaps  %xmm0, -80(%rbp)
    leaq    -80(%rbp), %rax
    movdqa  -16(%rbp), %xmm0
    movdqa  -64(%rbp), %xmm1
    movq    %rax, %rdi
    maskmovdqu  %xmm1, %xmm0

    Judging by how 1 folded and then 2 and then 3, I can now see the cost of the C ++ abstraction. Perhaps this is what Linus was talking about that day. (No, probably not.) Anyway, hope this helps someone!



All Articles