Do compilers usually generate vector (SIMD) instructions when they don't explicitly say so?

C ++ 17 adds extensions for parallelism to the standard library (for example, std::sort(std::execution::par_unseq, arr, arr + 1000)

allowing sorting with multiple threads and with vector instructions).

I noticed that Microsoft's experimental implementation mentions that the VC ++ compiler doesn't support vectorization here, which surprises me - I thought modern C ++ compilers might reason about loop vectorization, but apparently the VC ++ compiler / optimizer cannot generate a SIMD code even if explicitly stated. This similar lack of automatic vectorization support contradicts the answers to this 2011 question on Quora, which suggests that compilers will vectorize where possible.

Perhaps compilers will only vectorize very obvious cases such as std::array<int, 4>

, and nothing more, so explicitly parallelizing C ++ 17 would be helpful.

Hence, my question is: do current compilers automatically vectorize my code unless explicitly told to do so? (To make this question more specific, let's narrow it down to Intel x86 processors with SIMD support and the latest GCC, Clang, MSVC, and ICC.)

As an extension: compilers for other languages ​​better automate vectorization (possibly due to language design) (so the C ++ standards committee decides that this is necessary for explicit (C ++ 17-style) vectorization)?

+4


source to share


1 answer


The best compiler to automatically detect SIMD-style vectorization (when told that it can generate opcodes for the respective instruction sets of course) is my Intel compiler (which can generate code for dynamic dispatch depending on the actual processor if required). followed by GCC and Clang, and MSVC last (of your four).

I realize this is not surprising: Intel is really interested in helping developers take advantage of the latest features they've added to their offerings.

I work closely with Intel, and while they strive to demonstrate how their compiler can detect automatic vectorization, they also point out very correctly that using their compiler also allows you to use pragma simd constructs to further map compiler assumptions that may or may not be done (which is not clear from a purely syntactic level), and therefore allows the compiler to additionally vectorize the code without resorting to internals.

This, I think, indicates a problem with the hope that the compiler (for C ++ or another language) will do all the vectorization work ... if you have simple vector processing loops (for example, multiply all elements in a vector by a scalar) then yes, you might expect 3 out of 4 compilers to notice this.



But for more complex code, the benefits of vectorization may not come from simply unwinding the loop and combining iterations, but from actually using a different or modified algorithm, and making it difficult, if not impossible, for the compiler to execute it entirely on its own. Whereas, if you understand how vectorization can be applied to an algorithm, and you can structure your code so that the compiler can see the possibilities for this, perhaps with pragma simd or OpenMP constructs, then you can get the results you want.

Vectorization occurs when the code has a certain mechanical liking for the underlying processor and memory bus - if you have that, then I think the Intel compiler will be your best bet. Without this, changing the compilers may be of little value.

May I recommend Matt Godbolt Compiler Explorer as a way to actually test this - put your C ++ code there and see what the different compilers actually generate? Very handy ... it doesn't include the older version of MSVC (I think it currently supports V C ++ 2017 and later), but will show you what different versions of ICC, GCC, Clang, and others can do with the code. .,

+5


source







All Articles