CCC C vector expansion: how to check the result of comparison (for conditional assignment, etc.)?

Background: GCC C's built-in vector extensions allow for a fairly natural representation of SIMD vectors as C types. "According to the documentation, many built-in operations (+, -, etc.) are supported. However, the ternary operator as well as logical operators (&, ||) only work in C ++ for some reason. This is a problem for the all = C codebase.

Question: In GCC C, how to implement SIMD compliant [branching] conditionals of the form:

    v4si a = {2,-1,3,4}, b, indicesLessThan0;
    indicesLessThan0 = a < 0;
    b = indicesLessThan0 ? a : 0;

      

And, more generally, how to execute an arbitrary independent block of statements based on the same result:

v4si c = {9,8,7,6}, d;
for (int i = 0; i < 4; i++) {
  if (indicesLessThan0[i]) { // consider tests one by one
     b[i] = a[i] // as the ternary operator does above
     d[i] = c[i] + 1; // some other independent operation
  }
  else {
     b[i] = 0; // as the ternary operator does above
     d[i] = c[i] - 1; // another independent operation
  } 
}

      

If you make the statement block harder (SIMD forking is bad), it might be nice to run the ternary test again for any additional statements costing (presumably) some efficiency:

d = indicesLessThan0 ? c + 1 : c - 1; // the other operation in the loop

      

But the ternary operator doesn't work in C for some reason the manual doesn't explain. Is there another easy way? Some way to use if statements?

+3


source to share


1 answer


I found 3 solutions as a result of hitting the kitchen sink code.

  • Switch to g ++. Not too complicated, and it turns out that most of the code can be reversed by simply placing (type *) in front of all -allocs. Then I can just do:

    v16s8 condStor = test ? a : b;

  • Better yet, I found that you can just use bitbash using different mixes of and and |, just like everyone else does with bits inside integers. The trick is that vectors set the whole truth to 11111111 ... (unsigned -1), which forces the values ​​to stick when using bitwise operators.

  • Better yet, "type punning 101" with an internal function:
    v16s8 condStor = b; __builtin_ia32_maskmovdqu (a, test, (char *)(&condStor));


    This uses a function designed to do what # 2 does in one fell swoop.


Not sure? Check your build:

  • pxor    %xmm1, %xmm1
    movdqa  -64(%rbp), %xmm0
    pcmpeqb %xmm1, %xmm0
    pcmpeqd %xmm1, %xmm1
    pandn   %xmm1, %xmm0
    pxor    %xmm1, %xmm1
    pcmpgtb %xmm0, %xmm1
    movdqa  %xmm1, %xmm0
    movdqa  -32(%rbp), %xmm2
    movdqa  -16(%rbp), %xmm1
    pand    %xmm0, %xmm1
    pandn   %xmm2, %xmm0
    por %xmm1, %xmm0
    movaps  %xmm0, -80(%rbp)
    
          

  • movdqa  -64(%rbp), %xmm0
    movdqa  %xmm0, %xmm1
    pand    -16(%rbp), %xmm1
    pcmpeqd %xmm0, %xmm0
    pxor    -64(%rbp), %xmm0
    pand    -32(%rbp), %xmm0
    por %xmm1, %xmm0
    movaps  %xmm0, -80(%rbp)
    
          

  • movdqa  -32(%rbp), %xmm0
    movaps  %xmm0, -80(%rbp)
    leaq    -80(%rbp), %rax
    movdqa  -16(%rbp), %xmm0
    movdqa  -64(%rbp), %xmm1
    movq    %rax, %rdi
    maskmovdqu  %xmm1, %xmm0
    
          

    Judging by how 1 folded and then 2 and then 3, I can now see the cost of the C ++ abstraction. Perhaps this is what Linus was talking about that day. (No, probably not.) Anyway, hope this helps someone!

+3


source







All Articles