What is the fastest / best way to combine random band registers in AVX / SSE?
Let's say I have register 128 containing some floats [x1, x2, x3, x4] and another hold [y1, y2, y3, y4]. What would be the best way, from a performance standpoint, to get something like [x1, y1, x2, y2]?
I think I could change registers several times, use time series and then combine them in several steps, but I was wondering if I was missing some convenient instruction that could make my life easier. I suppose this is common, so I wonder what is best here.
Thank!
+3
source to share