Performance issues with simple computing
EDIT 2: Reduce program computation time by 16%! See bottom for calculation
I wrote an N-body simulator that implements the Barnes-Hut algorithm. Now I have an innocent function called CheckNode
. It's simple and doesn't take a lot of computation time, but the problem is that it is called millions of times, so it takes most of the computation time between each frame.
I have profiled the code and this function is responsible for 84.58%
the total computation time, and this is only with 10K particles, when I do it with 10x precision, this function uses more and more percentage.
Now here's a function, with the percentage of time spent on the right in red.
Now, there are some disturbing things here. As a simple if statement taking 9.17%
, and another if statement is over 20% of the computation time! Is there even the slightest optimization that can be done here that will be multiplied by millions of function calls to make my program run faster?
EDIT:
Here's the function CalculateForceNode
:
void CalculateForceNode(Body* bi, Node* bj) //bi is being attracted to bj. 15 flops of calculation
{
//vector from the body to the center of mass
double vectorx = bj->CenterOfMassx - bi->posX;
double vectory = bj->CenterOfMassy - bi->posY;
//c^2 = a^2 + b^2 + softener^2
double distSqr = vectorx * vectorx + vectory * vectory + Softener * Softener;
// ivnDistCube = 1/distSqr^(3/2)
double distSixth = distSqr * distSqr * distSqr;
double invDistCube = 1.0f / (sqrt(distSixth));
double Accel = (bj->TotalMass * invDistCube * _GRAV_CONST);
bi->AccelX += vectorx * Accel;
bi->AccelY += vectory * Accel;
}
EDIT 2:
Optimization results
Now the function CheckNode
takes 82.03%
all the computation time (measured during 1 min. 37 s of the sample), in contrast to the previously taken one 84.58%
.
Now the logic says that the remaining 15%
computation time coincides with the remaining computation time of the 18%
second program. Thus, these identical periods (the same code) took the 15%
first program and the 18%
second program. Allowing time for this other code to complete x
, the first program took 1/0.15
= 6.666x
and the second took 1/0.18
= 5.555x
. Then you can find the fraction that 5.555x
has 6.666x
, which is calculated as ~0.83
, and for that there was ( 1 - 0.83 = 0.16
) 16% reduction in program calculation time!
source to share
The first thing I'll try is to change the elements in one of your conditions, replace:
if(withSqr / distanceSqr < nodeThresholdSqr || pNode->HasChildren == false)
from:
if(pNode->HasChildren == false || (withSqr / distanceSqr < nodeThresholdSqr))
If the first part of the condition is true pNode->HasChildren == false
and the second (withSqr / distanceSqr < nodeThresholdSqr)
will never be met (read: evaluated). Simple conditional checking is much faster than floating point operations (division in your case). You can even go to the next level: * you need to calculate distanceSqr
AT ALL when pNode->HasChildren == false
?
EDIT: even better:
if(pNode->HasChildren == false)
{
CalculateForceNode(pBody,pNode);
}
else
{
double distanceSqr = ((diffX * diffX) + (diffY * diffY));
double withSqr = pNode->width * pNode->width;
if(withSqr / distanceSqr < nodeThresholdSqr)
{
CalculateForceNode(pBody,pNode);
}
else
{//if not, repeat function with child
if(pNode->Child[0]->Bodies.size() > 0)
CheckNode(pNode->Child[0],pBody);
//..... - all the rest of your code
}
}
source to share
Time-based profiling is not enough, you need to know what time wasted - in other words, use a more advanced profiler.
It also doesn't mention any information about the compiler or platform you are using.
For an if statement using 9% of the time, I don't think it's being compared, it's wasted fetching data. You have multiple levels of indirection (accessing data using a pointer that takes you to another pointer, etc.). This is bad for caching and branch prediction and I think you are wasting time fetching data from memory or useless computation due to branch miss prediction without doing the actual comparison.
another note I noticed: if (pNode-> HasChildren == false) you don't need all the calculations you did to find the widthSqr. I think you should rebuild your logic to check this first, if the condition is false, then you can calculate the widthSqr and continue your logic.
source to share
- You have to
inline
function first,Bodies.size()
or accesssize
directly so that there is no overhead when calling the function (it takes a while to push all the necessary information onto the stack and pop out). - I don't see all the code, but it looks like you can do a preliminary calculation
widthSqr
. It can be computed ifwidth
not assigned in a function. - You are using a lot of pointers here and it looks like your structures are scattered all over the memory. This will result in a large cache dip in the processor. Make sure that all of the data required for the calculation is in one long, contiguous and compact memory area.
- The
CalculateForceNode
check may be whether the pre-calculatedSoftener*Softener
. The function issqrt
very time consuming. The algorithmsqrt
is iterative, so you can sacrifice precision for speed by doing fewer iterations, or you can use lookup tables. -
You do the same calculations twice in
CalculateForceNode
.void CalculateForceNode(Body* bi, Node* bj) { //vector from the body to the center of mass double vectorx = bj->CenterOfMassx - bi->posX; double vectory = bj->CenterOfMassy - bi->posY; //c^2 = a^2 + b^2 + softener^2 double distSqr = vectorx * vectorx + vectory * vectory...
vectorx,vectory and distSqr
have already been calculated in CheckNode
how diffX, diffY and distanceSqr
. Manually inline whole function CalculateForceNode
.
source to share
Since the function is called many times, you should get rid of the call overhead by CalculateForceNode(...)
manually entering the code. In this case, you will notice other tricks:
void CheckNode(Node* pNode, Body* pBody)
{
double diffX = (pNode->CenterOfMass - pBody->posX);
double diffY = (pNode->CenterOfMass - pBody->posY);
double distanceSqr = ((diffX * diffX) + (diffY * diffY));
double widthSqr = pNode->width * pNode->width;
if (widthSqr / distanceSqr < NodeThresholdSqr || pNode->hasChildren == false)
{
//vector from the body to the center of mass
double vectorx = pNode->CenterOfMassx - pBody->posX;
double vectory = pNode->CenterOfMassy - pBody->posY;
//c^2 = a^2 + b^2 + softener^2
double distSqr = vectorx * vectorx + vectory * vectory + Softener * Softener;
// ivnDistCube = 1/distSqr^(3/2)
double distSixth = distSqr * distSqr * distSqr;
double invDistCube = 1.0f / (sqrt(distSixth));
double Accel = (pNode->TotalMass * invDistCube * _GRAV_CONST);
pBody->AccelX += vectorx * Accel;
pBody->AccelY += vectory * Accel;
}
else
{
CheckChildren(pNode, pBody);
}
}
Now you can see that diffX = vectorx
, diffY = vectory
, distSqr = distanceSqr*Softner*Softner
. Reusing some of the calculations already done and pre-calculating what is possible should save you a few cycles:
void CheckNode(Node* pNode, Body* pBody)
{
double diffX = (pNode->CenterOfMass - pBody->posX);
double diffY = (pNode->CenterOfMass - pBody->posY);
double distanceSqr = ((diffX * diffX) + (diffY * diffY));
double widthSqr = pNode->width * pNode->width;
double SoftnerSq = Softener * Softener; //precompute this value
if (widthSqr / distanceSqr < NodeThresholdSqr || pNode->hasChildren == false)
{
//c^2 = a^2 + b^2 + softener^2
double distSqr = distanceSqr + SoftnerSq;
// ivnDistCube = 1/distSqr^(3/2)
double distSixth = distSqr * distSqr * distSqr;
double invDistCube = 1.0f / (sqrt(distSixth));
double Accel = (pNode->TotalMass * invDistCube * _GRAV_CONST);
pBody->AccelX += diffX * Accel;
pBody->AccelY += diffY * Accel;
}
else
{
CheckChildren(pNode, pBody);
}
}
Hope this works for you.
source to share
Change the if statement and move all your calculations inside the part pNode->hasChildren == false
:
void CheckChildren(Node* pNode, Body* pBody)
{
if (pNode->Child[0]->Bodies.size() > 0)
CheckNode(...
}
void CheckNode(Node* pNode, Body* pBody)
{
if (pNode->hasChildren != false)
{
double diffX = (pNode->CenterOfMass - pBody->posX);
double diffY = (pNode->CenterOfMass - pBody->posY);
double distanceSqr = ((diffX * diffX) + (diffY * diffY));
double widthSqr = pNode->width * pNode->width;
if (widthSqr / distanceSqr < NodeThresholdSqr)
{
CalculateForceNode(pBody, pNode);
}
else
{
CheckChildren(pNode, pBody);
}
}
else
{
CheckChildren(pNode, pBody);
}
}
source to share