C: Writing Code That Can Be Automatically Vectorized, Nested Loop, GCC
I am trying to write some C code that can be vectorized. This is the loop I'm trying:
for(jj=0;jj<params.nx;jj++)
    for(kk=0;kk<NSPEEDS;kk++)
        local_density_vec[jj] += tmp_cells_chunk[jj].speeds[kk];
      
        
        
        
      
    GCC gives me the following message when run with the -ftree-vectorizer-verbose=5
      
        
        
        
      
     http://pastebin.com/RfCc04aS flag .
How can I rewrite it so that it can be automatically vectorized. NSPEEDS
      
        
        
        
      
    equals 5.
EDIT:
I kept working on this and I seem to be unable to vectorize anything with .speeds[kk]
      
        
        
        
      
    . Is there a way to rearrange it so that it can?
for (jj = 0; jj < nx; jj++) {
        partial = 0.0f;
        fp = c[jj].speeds;
        for (kk = 0; kk < M; kk++)
                partial += fp[kk];
        out[jj] = partial;
}
(...)
Calculated minimum iters for profitability: 12
36:   Profitability threshold = 11
Vectorizing loop at autovect.c:36
36: Profitability threshold is 11 loop iterations.
36: LOOP VECTORIZED.
      
        
        
        
      
    Important points:
1) In your dump, the loop was considered a "complex access pattern" (see the last line of your log). As noted, this is due to the fact that the compiler cannot check for aliasing. For "simple" access patterns see http://gcc.gnu.org/projects/tree-ssa/vectorization.html#vectorizab
2) In my example loop, 12 iterations were used for vectorization. Since NSPEEDS == 5, the compiler would waste time if it vectorized yours.
3) I was only able to vectorize my loop after adding -funsafe-math-optimizations. I believe this is necessary due to some kind of rounding or associativity behavior with the resulting vector operations. See for example: http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems
4) If you change the loop, you may have problems with "complex" access patterns. As noted, you may need to change the structure of the array. Check out the gcc vectorization docs about string accesses to see if you can match one of the patterns.
For completeness, here's a complete example: http://pastebin.com/CWhyqUny