Performance Algorithm - Order - Tree (data structure) only solution?

I have a problem, at first glance it looks simple, and it is true, however I am looking for some other solution (maybe easier):


SumA = V1 + V2
SumB = SumA + V3
SumC = SumB + SumA
SumD = SumC + V0


As we can see here, the "base" variables V0, V1, V2, V3 and V4

(the value of each of them is returned from the database queries)

The user will ask the software to return the result V1

and SumC


Solution I know:

Find all required variables: V1, SumC, SumB, SumA, V3, V2

For performance, I just want to process the math of each JUST ONE TIME variable.

This means that I need to order expressions from "base expressions" to "top variables".

At this point I only see a solution like "Tree (data structure)"> "Get V1, V2 and V3" Then get SumA, after getting SumB and only finally get SumC.

Is there any other way to solve this problem?

The ultimate goal of this algorithm is to use more complex variables and a few "mean variables". So performance is critical, I cannot perform the same math operation more than once.


source to share

4 answers

I'm not sure I fully understand, but I think you mean general subexpression elimination , [or something similar], which is a very common compiler optimization .

One common way of doing this optimization is to use a graph [which is actually a DAG ] expressions in the program and iteratively add new expressions. The "sources" in your DAG are all initial variables [V0, V1, V2, V3, V4 in your example]. You can "know" which expression is redundant if you have already calculated it, and avoid recalculating it.

These lecture notes seem to be a more detailed explanation (although I admit I haven't read all of this)



First of all, you need to build a tree with all the expressions. Trees are the simplest data structure for this case.

Now, suppose you have the following formulas:

SumA = v1 + v2
SumB = v1 + v2 + v3
SumC = ...


and the user asks SumB

(so you know how to calculate SumC

, but you don't have to to make the user happy).

In memory, it looks like this:

SumA = Add( v1, v2 )
SumB = Add( Add( v1, v2 ), v3 ) )


The next step is to define comparison operators that indicate whether two subtrees are the same. By running them, you will notice that it Add( v1, v2 )

appears twice, so you can optimize:

SumA = Add( v1, v2 )
SumB = Add( SumA, v3 )


This means you can achieve results with a minimum of computation. The next step is to add caching to your statements: when someone asks for their value, they have to cache it so the next call getValue()

can return the last result.

This means that the score SumA

or SumB

will fill the cache for SumA

. Since you never ask for a value SumC

, it never gets evaluated and therefore costs nothing.



The only way to speed it up is to use serialization at a level that you cannot get programmatically unless you use your own hardware. Example: enter image description here

Please ignore the note in the top right corner, this is stolen from my script :)

Case A: 100 * 4 cycles

Case B: The first result takes 3 cycles, each next one takes only 1 (serialization, Ford factory). - 102 cycles

102 versus 400 - about 4 * speed.

Modern processors can do this to some extent automatically, but it's quite difficult to measure. I heard that ICC (Intel C Compiler) optimizes the build to make the most of this, perhaps in part because they beat everything else on Intel CPU :)



Perhaps you could simplify it and eliminate the middle step:

SumA = (V1 + V2)*2
SumC = V3 + SumA




All Articles