Isn't this a double bypass?
In the programming tips section of the haskell wiki I found this example:
count :: (a -> Bool) -> [a] -> Int
count p = length . filter p
It is said to be the best alternative
count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count p (x:xs)
| p x = 1 + count p xs
| otherwise = count p xs
Which, in terms of readability, I totally agree with.
However, isn't this a double traversal, and therefore really worse than an explicit recursion function? Does laziness in GHC mean that it is equivalent to one move after optimization? Which implementation is faster and why?
source to share
So, to see what actually makes Optimizer, look at the core :
% ghc -O2 -ddump-simpl Temp.hs
[1 of 1] Compiling Temp ( Temp.hs, Temp.o )
==================== Tidy Core ====================
Result size of Tidy Core = {terms: 29, types: 26, coercions: 0}
Temp.count
:: forall a_arN.
(a_arN -> GHC.Types.Bool) -> [a_arN] -> GHC.Types.Int
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=DmdType <L,C(U)><S,1*U>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
ConLike=True, WorkFree=True, Expandable=True,
Guidance=IF_ARGS [60 0] 191 0}]
Temp.count =
\ (@ a_aMA)
(p_arV :: a_aMA -> GHC.Types.Bool)
(eta_B1 :: [a_aMA]) ->
letrec {
go_aNr [Occ=LoopBreaker]
:: [a_aMA] -> GHC.Prim.Int# -> GHC.Types.Int
[LclId, Arity=1, Str=DmdType <S,1*U>]
go_aNr =
\ (ds_aNs :: [a_aMA]) ->
case ds_aNs of _ [Occ=Dead] {
[] -> GHC.Types.I#;
: y_aNx ys_aNy ->
case p_arV y_aNx of _ [Occ=Dead] {
GHC.Types.False -> go_aNr ys_aNy;
GHC.Types.True ->
let {
g_a10B [Dmd=<L,C(U)>] :: GHC.Prim.Int# -> GHC.Types.Int
[LclId, Str=DmdType]
g_a10B = go_aNr ys_aNy } in
\ (x_a10C :: GHC.Prim.Int#) -> g_a10B (GHC.Prim.+# x_a10C 1)
}
}; } in
go_aNr eta_B1 0
We remove a little:
Temp.count :: forall aType. (aType -> Bool) -> [aType] -> Int
Temp.count = \(@ aType) (p :: aType -> Bool) (as :: [aType]) ->
letrec {
go :: [aType] -> GHC.Prim.Int# -> Int
go = \(xs :: [aType]) ->
case xs of _ {
[] -> I#; -- constructor to make a GHC.Prim.Int# into an Int
: y ys ->
case p y of _ {
False -> go ys;
True ->
let {
go' :: GHC.Prim.Int# -> Int
go' = go ys
} in \(x :: GHC.Prim.Int#) -> go' (GHC.Prim.+# x 1)
}
};
} in go as 0
Since we are working with an unboxed type GHC.Prim.Int#
, all padding is strict , so we only have one data loop.
source to share
In any case, you need to do one or two operations for each element. It is necessary to check the predicate. The second of Appendix 1 depends on the result of the predicate.
So, unless you consider the effects of cache, etc., both cases generate an equal number of operations.
The picture you get is that in the first case there are two separate passes: one collects the elements and one counts them. Given that the list is larger than the cache can handle, this will slow down processing. And it actually happens in strict language.
However, Haskell's laziness shows up here. The list generated with filter
is evaluated (spawned) in stages as the counting function length
needs it. And then when it length
uses them only for counting and does not store references to the newly created list, the items are immediately freed from garbage collection. Thus, it takes up memory at any time during the computation O(1)
.
There is a small overhead for creating the resulting "filtered" list in the first version. But this is usually negligible compared to the cache effects that occur when lists are large. (For small lists this can make a difference, you need to check it). Moreover, it can be optimized depending on the chosen compiler and optimization level.
Update. The second version actually consumes memory as stated in one of the comments. For a fair comparison, you need to write it with an accumulation parameter and a strictness annotator in the right place, because (I expect) it is length
already written that way.
source to share
You can check which version is faster than profiling , for example:
module Main where
main :: IO ()
main = print countme'
count' :: (a -> Bool) -> [a] -> Int
count' p = length . filter p
count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count p (x:xs)
| p x = 1 + count p xs
| otherwise = count p xs
list = [0..93823249]
countme' = count' (\x -> x `mod` 15 == 0) list
countme = count (\x -> x `mod` 15 == 0) list
Then run ghc -O2 -prof -fprof-auto -rtsopts Main.hs
and ./Main +RTS -p
. It will give the Main.prof file. Then replace the main function with use countme
and compare the results. Mine are:
- 4.12s for implicit version
- 6.34s for explicit version
If you turn off optimization, the implicit one will be slightly (but not much) faster.
Apart from merge and laziness, which have already been explained by others, I believe another reason may be that length
both filter
are Prelude functions and can be better optimized by the compiler.
source to share