Performance issues with simple computing

EDIT 2: Reduce program computation time by 16%! See bottom for calculation


I wrote an N-body simulator that implements the Barnes-Hut algorithm. Now I have an innocent function called CheckNode

. It's simple and doesn't take a lot of computation time, but the problem is that it is called millions of times, so it takes most of the computation time between each frame.

I have profiled the code and this function is responsible for 84.58%

the total computation time, and this is only with 10K particles, when I do it with 10x precision, this function uses more and more percentage.


Now here's a function, with the percentage of time spent on the right in red.

enter image description here


Now, there are some disturbing things here. As a simple if statement taking 9.17%

, and another if statement is over 20% of the computation time! Is there even the slightest optimization that can be done here that will be multiplied by millions of function calls to make my program run faster?

EDIT:

Here's the function CalculateForceNode

:

void CalculateForceNode(Body* bi, Node* bj)  //bi is being attracted to bj. 15 flops of calculation
{
    //vector from the body to the center of mass
    double vectorx = bj->CenterOfMassx - bi->posX;
    double vectory = bj->CenterOfMassy - bi->posY;

    //c^2 = a^2 + b^2 + softener^2
    double distSqr = vectorx * vectorx + vectory * vectory + Softener * Softener;

    // ivnDistCube = 1/distSqr^(3/2)
    double distSixth = distSqr * distSqr * distSqr;
    double invDistCube = 1.0f / (sqrt(distSixth));

    double Accel = (bj->TotalMass * invDistCube * _GRAV_CONST);

    bi->AccelX += vectorx * Accel;
    bi->AccelY += vectory * Accel;
}

      

EDIT 2:

Optimization results

Now the function CheckNode

takes 82.03%

all the computation time (measured during 1 min. 37 s of the sample), in contrast to the previously taken one 84.58%

.

Now the logic says that the remaining 15%

computation time coincides with the remaining computation time of the 18%

second program. Thus, these identical periods (the same code) took the 15%

first program and the 18%

second program. Allowing time for this other code to complete x

, the first program took 1/0.15

= 6.666x

and the second took 1/0.18

= 5.555x

. Then you can find the fraction that 5.555x

has 6.666x

, which is calculated as ~0.83

, and for that there was ( 1 - 0.83 = 0.16

) 16% reduction in program calculation time!

+3


source to share


5 answers


The first thing I'll try is to change the elements in one of your conditions, replace:

if(withSqr / distanceSqr < nodeThresholdSqr || pNode->HasChildren == false)

      

from:

if(pNode->HasChildren == false || (withSqr / distanceSqr < nodeThresholdSqr))

      



If the first part of the condition is true pNode->HasChildren == false

and the second (withSqr / distanceSqr < nodeThresholdSqr)

will never be met (read: evaluated). Simple conditional checking is much faster than floating point operations (division in your case). You can even go to the next level: * you need to calculate distanceSqr

AT ALL when pNode->HasChildren == false

?

EDIT: even better:

if(pNode->HasChildren == false)
{
    CalculateForceNode(pBody,pNode);
}
else
{
    double distanceSqr = ((diffX * diffX) + (diffY * diffY));
    double withSqr     = pNode->width * pNode->width;
    if(withSqr / distanceSqr < nodeThresholdSqr)
    {
        CalculateForceNode(pBody,pNode);
    }
    else
    {//if not, repeat function with child
        if(pNode->Child[0]->Bodies.size() > 0)
            CheckNode(pNode->Child[0],pBody);
        //..... - all the rest of your code
    }
}

      

+6


source


Time-based profiling is not enough, you need to know what time wasted - in other words, use a more advanced profiler.

It also doesn't mention any information about the compiler or platform you are using.



For an if statement using 9% of the time, I don't think it's being compared, it's wasted fetching data. You have multiple levels of indirection (accessing data using a pointer that takes you to another pointer, etc.). This is bad for caching and branch prediction and I think you are wasting time fetching data from memory or useless computation due to branch miss prediction without doing the actual comparison.

another note I noticed: if (pNode-> HasChildren == false) you don't need all the calculations you did to find the widthSqr. I think you should rebuild your logic to check this first, if the condition is false, then you can calculate the widthSqr and continue your logic.

+3


source


  • You have to inline

    function first, Bodies.size()

    or access size

    directly so that there is no overhead when calling the function (it takes a while to push all the necessary information onto the stack and pop out).
  • I don't see all the code, but it looks like you can do a preliminary calculation widthSqr

    . It can be computed if width

    not assigned in a function.
  • You are using a lot of pointers here and it looks like your structures are scattered all over the memory. This will result in a large cache dip in the processor. Make sure that all of the data required for the calculation is in one long, contiguous and compact memory area.
  • The CalculateForceNode

    check may be whether the pre-calculated Softener*Softener

    . The function is sqrt

    very time consuming. The algorithm sqrt

    is iterative, so you can sacrifice precision for speed by doing fewer iterations, or you can use lookup tables.
  • You do the same calculations twice in CalculateForceNode

    .

    void CalculateForceNode(Body* bi, Node* bj) 
        {
            //vector from the body to the center of mass
            double vectorx = bj->CenterOfMassx - bi->posX;
            double vectory = bj->CenterOfMassy - bi->posY;
    
            //c^2 = a^2 + b^2 + softener^2
            double distSqr = vectorx * vectorx + vectory * vectory...
    
          

vectorx,vectory and distSqr

have already been calculated in CheckNode

how diffX, diffY and distanceSqr

. Manually inline whole function CalculateForceNode

.

+2


source


Since the function is called many times, you should get rid of the call overhead by CalculateForceNode(...)

manually entering the code. In this case, you will notice other tricks:

void CheckNode(Node* pNode, Body* pBody)
{    
    double diffX = (pNode->CenterOfMass - pBody->posX);
    double diffY = (pNode->CenterOfMass - pBody->posY);

    double distanceSqr = ((diffX * diffX) + (diffY * diffY));
    double widthSqr = pNode->width * pNode->width;

    if (widthSqr / distanceSqr < NodeThresholdSqr || pNode->hasChildren == false)
    {       
        //vector from the body to the center of mass
        double vectorx = pNode->CenterOfMassx - pBody->posX;
        double vectory = pNode->CenterOfMassy - pBody->posY;

        //c^2 = a^2 + b^2 + softener^2
        double distSqr = vectorx * vectorx + vectory * vectory + Softener * Softener;

        // ivnDistCube = 1/distSqr^(3/2)
        double distSixth = distSqr * distSqr * distSqr;
        double invDistCube = 1.0f / (sqrt(distSixth));

        double Accel = (pNode->TotalMass * invDistCube * _GRAV_CONST);

        pBody->AccelX += vectorx * Accel;
        pBody->AccelY += vectory * Accel;
    }
    else
    {
       CheckChildren(pNode, pBody);
    }

}

      

Now you can see that diffX = vectorx

, diffY = vectory

, distSqr = distanceSqr*Softner*Softner

. Reusing some of the calculations already done and pre-calculating what is possible should save you a few cycles:

void CheckNode(Node* pNode, Body* pBody)
{    
    double diffX = (pNode->CenterOfMass - pBody->posX);
    double diffY = (pNode->CenterOfMass - pBody->posY);

    double distanceSqr = ((diffX * diffX) + (diffY * diffY));
    double widthSqr = pNode->width * pNode->width;
    double SoftnerSq = Softener * Softener;  //precompute this value

    if (widthSqr / distanceSqr < NodeThresholdSqr || pNode->hasChildren == false)
    { 
        //c^2 = a^2 + b^2 + softener^2
        double distSqr = distanceSqr + SoftnerSq;

        // ivnDistCube = 1/distSqr^(3/2)
        double distSixth = distSqr * distSqr * distSqr;
        double invDistCube = 1.0f / (sqrt(distSixth));

        double Accel = (pNode->TotalMass * invDistCube * _GRAV_CONST);

        pBody->AccelX += diffX * Accel;
        pBody->AccelY += diffY * Accel;
    }
    else
    {
       CheckChildren(pNode, pBody);
    }

}

      

Hope this works for you.

+2


source


Change the if statement and move all your calculations inside the part pNode->hasChildren == false

:

void CheckChildren(Node* pNode, Body* pBody)
{
    if (pNode->Child[0]->Bodies.size() > 0)
        CheckNode(...
}

void CheckNode(Node* pNode, Body* pBody)
{
    if (pNode->hasChildren != false)
    {
        double diffX = (pNode->CenterOfMass - pBody->posX);
        double diffY = (pNode->CenterOfMass - pBody->posY);

        double distanceSqr = ((diffX * diffX) + (diffY * diffY));
        double widthSqr = pNode->width * pNode->width;

        if (widthSqr / distanceSqr < NodeThresholdSqr)
        {
            CalculateForceNode(pBody, pNode);
        }
        else
        {
            CheckChildren(pNode, pBody);
        }
    }
    else
    {
        CheckChildren(pNode, pBody);
    }
}

      

+1


source







All Articles