C ++ nested looping of codes

I would like to know under what conditions the invariant parts of nested loops can be optimized.

For this, I wrote two functions, one of which implements the factorization of three nested loops, and the other does not.

An unfactorized function looks like this:

template<int k>
double __attribute__ ((noinline)) evaluate(const double u[], const double phi[])
{
  double f = 0.;
  for (int i3 = 0;i3<k;++i3)
    for (int i2 = 0;i2<k;++i2)
      for (int i1 = 0;i1<k;++i1)
        f += u[i1+k*(i2+k*i3)] * phi[i1] * phi[i2] * phi[i3];
  return f;
}

      

Whereas the factorized function is:

template<int k>
double __attribute__ ((noinline)) evaluate_fact(const double u[], const double phi[])
{
  double f3 = 0.;
  for (int i3 = 0;i3<k;++i3)
    {
      double f2 = 0.;
      for (int i2 = 0;i2<k;++i2)
        {
          double f1 = 0.;
          for (int i1 = 0;i1<k;++i1)
            {
              f1 += u[i1+k*(i2+k*i3)] * phi[i1];
            }
          f2 += f1  * phi[i2];
        }
      f3 += f2 * phi[i3];
    }
  return f3;
}

      

Which I am calling with the following main:

int main()
{
  const static unsigned int k=20;
  double u[k*k*k];
  double phi[k];

  phi[0] = 1.;
  for (unsigned int i=1;i<k;++i)
    phi[i] = phi[i-1]*.333;

  double e = 0.;
  for (unsigned int i=0;i<1000;++i)
    {
      e += evaluate<k>(u, phi);
      //e += evaluate_fact<k>(u, phi);
    }
  std::cout << "Evaluate " << e << std::endl;
}

      

For small k, both functions generate the same assembly code, but after a certain size k ~ = 10, the assembly does not look the same and callgrind shows more operations being performed in the unfactorized version.

How should I write my code (if at all possible), or what should I tell GCC that optim () is optimized for evaluating_fact () ???

I am using GCC 7.1.0. with flags -Ofast -fmove-loop-invariants

Using the -funroll-loop doesn't help unless I add --param max-full-peeled-insns = 10000 --param max-full-peel-times = 10000, but that's a completely different thing because it's basically unrolling everything , the assembly is extensive.

Using -fassociative-math doesn't help.

This article states that: "The traditional looping of code, which is typically used by general-purpose compilers, only checks for invariance with the innermost loop." Does this relate to my code?

Thank!

+3


source to share





All Articles