Haskell newbie: how to avoid recalculating a deterministic function with the same parameters?

I am new to haskell so I am just starting to understand the basic concepts.

I have the following function that converts a list of counters to a discrete probability density function:

freq2prob l = [ (curr / (sum l))) | curr <- l ]

      

Unfortunately, it is (sum l)

calculated for each of the elements l

, which complicates the computational complexity.

What's the most concise, elegant, haskellic way to handle this?

+3


source to share


2 answers


It's simple:

freq2prob l = [ curr / s | let s = sum l, curr <- l ] 

      

you can also place it outside of list comprehension: freq2prob l = let s = sum l in [ curr / s | curr <- l ]

(note at in

). This is actually the same calculation.

This is because the former is essentially translated into



freq2prob :: (Fractional a) => [a] -> [a]
freq2prob l = [ curr / s | let s = sum l, curr <- l ] 
 = do
     let s = sum l
     curr <- l
     return (curr / s)
 = let s=sum l in
   l >>= (\curr -> [curr / s])
   -- concatMap (\curr -> [curr / s]) l
   -- map (\curr -> curr / s) l

      

and the second, obviously, to the same code,

freq2prob l = let s = sum l in [ curr / s | curr <- l ]
 = let s = sum l in
   do
     curr <- l
     return (curr / s)
 = let s=sum l in
   l >>= (\curr -> [curr / s])

      

+4


source


We can use a let statement or where clause for this:

freq2prob l = let s = sum l in 
              [ curr / s | curr <- l ]

      

or

freq2prob l = [ curr / s | curr <- l ] 
    where s = sum l

      

but it would be more idiomatic to use a higher-order function than a list comprehension, since you do the same for each element:

freq2prob l = map (/sum l) l

      

sum l

in the division function (/sum l)

will only be evaluated once.

This is due to the fact that when evaluating, the map f xs

compiler does not make an elementary mistake when creating multiple copies of the function f

for evaluation separately; this is what will indicate when needed.

As a simple and dumb test, we can examine raw timing statistics in ghci, whether it is noticeably faster to use the same function multiple times, or a slightly different function each time. First I'll check if the results of sums are normally cached in ghci:



ghci> sum [2..10000000]
50000004999999
(8.31 secs, 1533723640 bytes)
ghci> sum [2..10000000]
50000004999999
(8.58 secs, 1816661888 bytes)

      

So you can see that it was not cached and that there was a slight deviation in these raw statistics. Now multiply by the same tricky thing every time:

ghci> map (* sum [2..10000000]) [1..10]
[50000004999999,100000009999998,150000014999997,200000019999996,250000024999995,300000029999994,350000034999993,400000039999992,450000044999991,500000049999990]
(8.30 secs, 1534499200 bytes)

      

So, (including a little variance, it took almost exactly the same time to multiply ten numbers by sum [2..10000000]

, using map

than multiplying one. Multiplying ten pairs of numbers is almost unnecessary. So ghci (an interpreter, not even an optimizing compiler) did not introduce multiple copies of one and the same calculation.

It's not because ghci is smart, because lazy evaluation, a nice feature of pure functional programming, never does more work than necessary. In most programming languages, it would be difficult to optimize for running a long computation all over the place instead of storing its result in a variable.

Now compare this by doing slightly different calculations each time we add slightly fewer numbers as we go.

ghci> map (\x -> sum [x..10000000]) [1..10]
[50000005000000,50000004999999,50000004999997,50000004999994,50000004999990,50000004999985,50000004999979,50000004999972,50000004999964,50000004999955]
(77.98 secs, 16796207024 bytes)

      

Well, it took about ten times as long as we expected, because now we ask him to do something different every time. I can check for you that this is paused for each number, whereas when we did not change the number of expensive computations, it was evaluated only once, and the pause was until the first number, and the rest quickly appeared.

+4


source







All Articles