How to avoid recomputation of a pure function with the same parameters?

233 Views Asked by At

I have the following function that convert a list of counts to a discrete probability density function:

freq2prob l = [ (curr / (sum l))) | curr <- l ]

Unfortunately (sum l) is computed for each of the l elements making the computational complexity unnecessarily high.

What is the most concise, elegant, "haskellic" way to deal with this?

2

There are 2 best solutions below

1
On BEST ANSWER

It's simple:

freq2prob l = [ curr / s | let s = sum l, curr <- l ] 

you can put it outside the list comprehension as well: freq2prob l = let s = sum l in [ curr / s | curr <- l ] (notice the in). This is effectively the same computation.

That is because the first is essentially translated into

freq2prob :: (Fractional a) => [a] -> [a]
freq2prob l = [ curr / s | let s = sum l, curr <- l ] 
 = do
     let s = sum l
     curr <- l
     return (curr / s)
 = let s=sum l in
   l >>= (\curr -> [curr / s])
   -- concatMap (\curr -> [curr / s]) l
   -- map (\curr -> curr / s) l

and the second, obviously, to the same code,

freq2prob l = let s = sum l in [ curr / s | curr <- l ]
 = let s = sum l in
   do
     curr <- l
     return (curr / s)
 = let s=sum l in
   l >>= (\curr -> [curr / s])
4
On

We can use a let statement or a where clause for this:

freq2prob l = let s = sum l in 
              [ curr / s | curr <- l ]

or

freq2prob l = [ curr / s | curr <- l ] 
    where s = sum l

but it would be more idiomatic to use a higher order function than list comprehension, since you're doing the same thing to each element:

freq2prob l = map (/sum l) l

The sum l in the dividing function (/sum l) will only be evaluated once.

This is because when evaluating map f xs, the compiler doesn't make the elementary mistake of creating multiple copies of the function f to be evaluated separately; it's a thunk that will be pointed to by every occurrence where it's needed.

As a simple and blunt test, we can investigate crude timing stats in ghci for whether it's noticeably faster to use the same function repeatedly or slightly different function each time. First I'll check whether the results of sums are usually cached in ghci:

ghci> sum [2..10000000]
50000004999999
(8.31 secs, 1533723640 bytes)
ghci> sum [2..10000000]
50000004999999
(8.58 secs, 1816661888 bytes)

So you can see it wasn't cached, and that there's a little variance in these crude stats. Now let's multiply by the same complicated thing every time:

ghci> map (* sum [2..10000000]) [1..10]
[50000004999999,100000009999998,150000014999997,200000019999996,250000024999995,300000029999994,350000034999993,400000039999992,450000044999991,500000049999990]
(8.30 secs, 1534499200 bytes)

So (including a little variance, it took almost exactly the same time to multiply ten numbers by sum [2..10000000] using map than multiplying a single one. Multiplying ten pairs of numbers takes hardly any time at all. So ghci (an interpreter, not even an optimising compiler) didn't introduce multiple copies of the same calculation.

This isn't because ghci is clever, it's because lazy evaluation, a nice feature of pure functional programming, never does more work than necessary. In a most programming languages it would be hard to optimise away passing a lengthy calculation around all over the place instead of saving its result in a variable.

Now let's compare that with doing a slightly different calculation each time, where we add up slightly fewer numbers as we go.

ghci> map (\x -> sum [x..10000000]) [1..10]
[50000005000000,50000004999999,50000004999997,50000004999994,50000004999990,50000004999985,50000004999979,50000004999972,50000004999964,50000004999955]
(77.98 secs, 16796207024 bytes)

Well, that took roughly ten times as long, as we expected, because now we're asking it to do a different thing every time. I can verify for you that this paused for each number, whereas when we didn't change the expensive-to-calculate number, it was only evaluated once, and the pause was before the first number and the rest appeared rapidly.