What is the fastest / best way to combine random band registers in AVX / SSE?

Let's say I have register 128 containing some floats [x1, x2, x3, x4] and another hold [y1, y2, y3, y4]. What would be the best way, from a performance standpoint, to get something like [x1, y1, x2, y2]?

I think I could change registers several times, use time series and then combine them in several steps, but I was wondering if I was missing some convenient instruction that could make my life easier. I suppose this is common, so I wonder what is best here.

Thank!

+3


source to share


1 answer


In this particular case, you can do it with one instruction:

z = _mm_unpacklo_epi32(x, y);

      



_mm_unpacklo_xxx

/ _mm_unpackhi_xxx

can be very useful for various data reorganization operations. For more general cases, there are also commands _mm_shuffle_xxx

.

+3


source







All Articles