Isn't this a double bypass?

In the programming tips section of the haskell wiki I found this example:

count :: (a -> Bool) -> [a] -> Int
count p = length . filter p


It is said to be the best alternative

count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count p (x:xs)
   | p x       = 1 + count p xs
   | otherwise =     count p xs


Which, in terms of readability, I totally agree with.

However, isn't this a double traversal, and therefore really worse than an explicit recursion function? Does laziness in GHC mean that it is equivalent to one move after optimization? Which implementation is faster and why?


source to share

3 answers

So, to see what actually makes Optimizer, look at the core :

% ghc -O2 -ddump-simpl Temp.hs
[1 of 1] Compiling Temp             ( Temp.hs, Temp.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 29, types: 26, coercions: 0}

  :: forall a_arN.
     (a_arN -> GHC.Types.Bool) -> [a_arN] -> GHC.Types.Int
 Str=DmdType <L,C(U)><S,1*U>,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
         ConLike=True, WorkFree=True, Expandable=True,
         Guidance=IF_ARGS [60 0] 191 0}]
Temp.count =
  \ (@ a_aMA)
    (p_arV :: a_aMA -> GHC.Types.Bool)
    (eta_B1 :: [a_aMA]) ->
    letrec {
      go_aNr [Occ=LoopBreaker]
        :: [a_aMA] -> GHC.Prim.Int# -> GHC.Types.Int
      [LclId, Arity=1, Str=DmdType <S,1*U>]
      go_aNr =
        \ (ds_aNs :: [a_aMA]) ->
          case ds_aNs of _ [Occ=Dead] {
            [] -> GHC.Types.I#;
            : y_aNx ys_aNy ->
              case p_arV y_aNx of _ [Occ=Dead] {
                GHC.Types.False -> go_aNr ys_aNy;
                GHC.Types.True ->
                  let {
                    g_a10B [Dmd=<L,C(U)>] :: GHC.Prim.Int# -> GHC.Types.Int
                    [LclId, Str=DmdType]
                    g_a10B = go_aNr ys_aNy } in
                  \ (x_a10C :: GHC.Prim.Int#) -> g_a10B (GHC.Prim.+# x_a10C 1)
          }; } in
    go_aNr eta_B1 0


We remove a little:

Temp.count :: forall aType.  (aType -> Bool) -> [aType] -> Int
Temp.count = \(@ aType) (p :: aType -> Bool) (as :: [aType]) ->
  letrec {
    go :: [aType] -> GHC.Prim.Int# -> Int
    go = \(xs :: [aType]) ->
      case xs of _ {
        [] -> I#; -- constructor to make a GHC.Prim.Int# into an Int
        : y ys ->
          case p y of _ {
            False -> go ys;
            True ->
              let {
                go' :: GHC.Prim.Int# -> Int
                go' = go ys 
              } in \(x :: GHC.Prim.Int#) -> go' (GHC.Prim.+# x 1)
  } in go as 0


Since we are working with an unboxed type GHC.Prim.Int#

, all padding is strict , so we only have one data loop.



In any case, you need to do one or two operations for each element. It is necessary to check the predicate. The second of Appendix 1 depends on the result of the predicate.

So, unless you consider the effects of cache, etc., both cases generate an equal number of operations.

The picture you get is that in the first case there are two separate passes: one collects the elements and one counts them. Given that the list is larger than the cache can handle, this will slow down processing. And it actually happens in strict language.

However, Haskell's laziness shows up here. The list generated with filter

is evaluated (spawned) in stages as the counting function length

needs it. And then when it length

uses them only for counting and does not store references to the newly created list, the items are immediately freed from garbage collection. Thus, it takes up memory at any time during the computation O(1)


There is a small overhead for creating the resulting "filtered" list in the first version. But this is usually negligible compared to the cache effects that occur when lists are large. (For small lists this can make a difference, you need to check it). Moreover, it can be optimized depending on the chosen compiler and optimization level.

Update. The second version actually consumes memory as stated in one of the comments. For a fair comparison, you need to write it with an accumulation parameter and a strictness annotator in the right place, because (I expect) it is length

already written that way.



You can check which version is faster than profiling , for example:

module Main where

main :: IO ()
main = print countme'

count' :: (a -> Bool) -> [a] -> Int
count' p = length . filter p

count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count p (x:xs)
   | p x       = 1 + count p xs
   | otherwise =     count p xs

list = [0..93823249]

countme' = count' (\x -> x `mod` 15 == 0) list
countme = count (\x -> x `mod` 15 == 0) list


Then run ghc -O2 -prof -fprof-auto -rtsopts Main.hs

and ./Main +RTS -p

. It will give the file. Then replace the main function with use countme

and compare the results. Mine are:

  • 4.12s for implicit version
  • 6.34s for explicit version

If you turn off optimization, the implicit one will be slightly (but not much) faster.

Apart from merge and laziness, which have already been explained by others, I believe another reason may be that length

both filter

are Prelude functions and can be better optimized by the compiler.



All Articles