How to check inf for inline AVX __m256
What is the best way to check if there is any embedded in AVX __m256
(vector 8 float
) inf
? I tried
__m256 X=_mm256_set1_ps(1.0f/0.0f);
_mm256_cmp_ps(X,X,_CMP_EQ_OQ);
but this is compared to true
. Note that this method will find nan
(which is compared to false
). So one way is to check X!=nan && 0*X==nan
:
__m256 Y=_mm256_mul_ps(X,_mm256_setzero_ps()); // 0*X=nan if X=inf
_mm256_andnot_ps(_mm256_cmp_ps(Y,Y,_CMP_EQ_OQ),
_mm256_cmp_ps(X,X,_CMP_EQ_OQ));
However, this looks somewhat lengthy. Is there a faster way?
source to share
If you want to check if a vector has any infinities:
#include <limits>
bool has_infinity(__m256 x){
const __m256 SIGN_MASK = _mm256_set1_ps(-0.0);
const __m256 INF = _mm256_set1_ps(std::numeric_limits<float>::infinity());
x = _mm256_andnot_ps(SIGN_MASK, x);
x = _mm256_cmp_ps(x, INF, _CMP_EQ_OQ);
return _mm256_movemask_ps(x) != 0;
}
If you want a vector mask of infinite values:
#include <limits>
__m256 is_infinity(__m256 x){
const __m256 SIGN_MASK = _mm256_set1_ps(-0.0);
const __m256 INF = _mm256_set1_ps(std::numeric_limits<float>::infinity());
x = _mm256_andnot_ps(SIGN_MASK, x);
x = _mm256_cmp_ps(x, INF, _CMP_EQ_OQ);
return x;
}
source to share
I think the best solution is to use vptest
instead of vmovmskps
.
bool has_infinity(const __m256 &x) {
__m256 s = _mm256_andnot_ps(_mm256_set1_ps(-0.0), x);
__m256 cmp = _mm256_cmp_ps(s,_mm256_set1_ps(1.0f/0.0f),0);
__m256i cmpi = _mm256_castps_si256(cmp);
return !_mm256_testz_si256(cmpi,cmpi);
}
Characteristic <T23> just to make the compiler happy "This internal is only used for compilation and does not generate any instructions, so it has zero latency."
vptest
is superior vmovmskps
because it sets the flag to zero and vmovmskps
not. The vmovmskps
compiler must generate test
to set the flag to zero.
source to share
I had an idea, but it only helps me if you want to check that ALL elements are infinite. Unfortunately.
With AVX2 you can check that all items are infinite with PTEST
. I got the idea to use xor to compare for equality from EOF's comment on this question , which I used for my answer there. I thought I could make a shorter version of test-for-any-inf, but of course pxor
only works as a test for all 256b equal.
#include <limits>
bool all_infinity(__m256 x){
const __m256i SIGN_MASK = _mm256_set1_epi32(0x7FFFFFFF); // -0.0f inverted
const __m256 INF = _mm256_set1_ps(std::numeric_limits<float>::infinity());
x = _mm256_xor_si256(x, INF); // other than sign bit, x will be all-zero only if all the bits match.
return _mm256_testz_si256(x, SIGN_MASK); // flags are ready to branch on directly
}
Since AVX512 exists __mmask8 _mm512_fpclass_pd_mask (__m512d a, int imm8)
. ( vfpclasspd
). (See Intel manual ). Its output is a mask register and I haven't looked into testing / branching for value there. But you can check any / all +/- zero, +/- inf, Q / S NaN, Denormal, Negative.
source to share