How to optimize for constant division?

Optimization for divided by constant is well optimized by gcc as you know :)

Now I'm wondering how constant division is optimized . gcc doesn't help me, nor does clang.

Perhaps I am not good at such information, but I cannot find material on optimization for the separating constant . (In contrast, division by a constant is well introduced.)

#include <stdio.h>

int f(int x)
{
    // can I optimize off the idiv opcode here?
    return 33659/x;
}

int main()
{
    int x;
    scanf("%d", &x);
    printf("%d", f(x));
    return 0;
}

      

EDIT1:

#include <stdio.h>

#define DIVIDEND 33

void f ( unsigned int* arr, int n )
{
    for ( int i = 0; i < n ; i++ )
    {
        arr[i] = DIVIDEND / arr[i];
    }
}

int main()
{
    const int n = 1024;
    unsigned int buf[n];
    for ( int i = 0; i < n; i++ )
    {
        scanf ( "%u", buf + i );
    }
    f ( buf, n );
    for ( int i = 0; i < n; i++ )
    {
        printf ( "%d", buf[i] );
    }
    return 0;
}

      

Optimized with clang -O3 -march=native div.c -o div

just unrolling for loop, while:

#include <stdio.h>

#define DIVIDEND 33
#define DIVISOR DIVIDEND

void f ( unsigned int* arr, int n )
{
    for ( int i = 0; i < n ; i++ )
    {
        //arr[i] = DIVIDEND / arr[i];
        arr[i] = arr[i] / DIVISOR;
    }
}

int main()
{
    const int n = 1024;
    unsigned int buf[n];
    for ( int i = 0; i < n; i++ )
    {
        scanf ( "%u", buf + i );
    }
    f ( buf, n );
    for ( int i = 0; i < n; i++ )
    {
        printf ( "%d", buf[i] );
    }
    return 0;
}

      

using the same command line, you end up with a bunch of terrifying AVX2 code. (Remember that division by a constant is rewritten in shift + mul + add, which can be vectorized!)

EDIT2: Thanks @ user2722968! Applying RCPPS will speed up program execution.

Here is my experimental implementation using RCPPS for fast constant divisor division:

https://github.com/ThinerDAS/didactic-spoon/blob/master/div.c

However, I'm not sure how to make it more accurate without a lot of overhead.

+3


source to share


1 answer


If you can call out really good optimizations for "divided by", you may need to compute the downside x/33659

with the RCPPS command (which uses SSE / AVX).



+1


source







All Articles