C ++ nested looping of codes
I would like to know under what conditions the invariant parts of nested loops can be optimized.
For this, I wrote two functions, one of which implements the factorization of three nested loops, and the other does not.
An unfactorized function looks like this:
template<int k>
double __attribute__ ((noinline)) evaluate(const double u[], const double phi[])
{
double f = 0.;
for (int i3 = 0;i3<k;++i3)
for (int i2 = 0;i2<k;++i2)
for (int i1 = 0;i1<k;++i1)
f += u[i1+k*(i2+k*i3)] * phi[i1] * phi[i2] * phi[i3];
return f;
}
Whereas the factorized function is:
template<int k>
double __attribute__ ((noinline)) evaluate_fact(const double u[], const double phi[])
{
double f3 = 0.;
for (int i3 = 0;i3<k;++i3)
{
double f2 = 0.;
for (int i2 = 0;i2<k;++i2)
{
double f1 = 0.;
for (int i1 = 0;i1<k;++i1)
{
f1 += u[i1+k*(i2+k*i3)] * phi[i1];
}
f2 += f1 * phi[i2];
}
f3 += f2 * phi[i3];
}
return f3;
}
Which I am calling with the following main:
int main()
{
const static unsigned int k=20;
double u[k*k*k];
double phi[k];
phi[0] = 1.;
for (unsigned int i=1;i<k;++i)
phi[i] = phi[i-1]*.333;
double e = 0.;
for (unsigned int i=0;i<1000;++i)
{
e += evaluate<k>(u, phi);
//e += evaluate_fact<k>(u, phi);
}
std::cout << "Evaluate " << e << std::endl;
}
For small k, both functions generate the same assembly code, but after a certain size k ~ = 10, the assembly does not look the same and callgrind shows more operations being performed in the unfactorized version.
How should I write my code (if at all possible), or what should I tell GCC that optim () is optimized for evaluating_fact () ???
I am using GCC 7.1.0. with flags -Ofast -fmove-loop-invariants
Using the -funroll-loop doesn't help unless I add --param max-full-peeled-insns = 10000 --param max-full-peel-times = 10000, but that's a completely different thing because it's basically unrolling everything , the assembly is extensive.
Using -fassociative-math doesn't help.
This article states that: "The traditional looping of code, which is typically used by general-purpose compilers, only checks for invariance with the innermost loop." Does this relate to my code?
Thank!
source to share
No one has answered this question yet
See similar questions:
or similar: