Packed big bit vector with effective number of xor and bits in Haskell
I'm looking for an efficient (in both space and time) datatype that can hold a 384 bit vector and supports efficient XOR and "bit" operations (the number of bits set to 1).
Below is my demo program. The operations I need are of the type class SOQuestionOps
and I have implemented it for Natural
and Data.Vector.Unboxed.Bit
. Especially the latter seems to be ideal as it has an operation zipWords
that should allow me to do operations such as "counting bits" and "spelling" XOR instead of bitwise. It also claims to store a bit packed (8 bits per byte).
{-# LANGUAGE FlexibleInstances #-}
import Data.Bits
import Data.List (foldl')
import Numeric.Natural
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed.Bit as BV
class SOQuestionOps a where
soqoXOR :: a -> a -> a
soqoBitCount :: a -> Int
soqoFromList :: [Bool] -> a
alternating :: Int -> [Bool]
alternating n =
let c = n `mod` 2 == 0
in if n == 0
then []
else c : alternating (n-1)
instance SOQuestionOps Natural where
soqoXOR = xor
soqoBitCount = popCount
soqoFromList v =
let oneIdxs = map snd $ filter fst (zip v [0..])
in foldl' (\acc n -> acc `setBit` n) 0 oneIdxs
instance SOQuestionOps (BV.Vector BV.Bit) where
soqoXOR = BV.zipWords xor
soqoBitCount = BV.countBits
soqoFromList v = BV.fromList (map BV.fromBool v)
main =
let initialVec :: BV.Vector BV.Bit
initialVec = soqoFromList $ alternating 384
lotsOfVecs = V.replicate 10000000 (soqoFromList $ take 384 $ repeat True)
xorFolded = V.foldl' soqoXOR initialVec lotsOfVecs
sumBitCounts = V.foldl' (\n v -> n + soqoBitCount v) 0 lotsOfVecs
in putStrLn $ "folded bit count: " ++ show (soqoBitCount xorFolded) ++ ", sum: " ++ show sumBitCounts
So, let's calculate the numbers for the best case: lotsOfVecs
no need to allocate much, because it is only 10,000,000 times the same vector initialVec
. Foldl obviously creates one of these vectors for the addition operation, so it must create 10,000,000 bit vectors. The bit count must create anything other than 10,000,000 Int
s. Therefore, at best, my program should use very small (and persistent) memory, and the overall allocations should be approximately 10,000,000 * sizeof (vector bit) + 10,000,000 * sizeof (int) = 520,000,000 bytes.
Ok, run the program for Natural
:
do initialVec :: Natural
, compile with
ghc --make -rtsopts -O3 MemStuff.hs
(this is with GHC 7.10.1):
$ ./MemStuff +RTS -sstderr
folded bit count: 192, sum: 3840000000
1,280,306,112 bytes allocated in the heap
201,720 bytes copied during GC
80,106,856 bytes maximum residency (2 sample(s))
662,168 bytes maximum slop
78 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2321 colls, 0 par 0.056s 0.059s 0.0000s 0.0530s
Gen 1 2 colls, 0 par 0.065s 0.069s 0.0346s 0.0674s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.579s ( 0.608s elapsed)
GC time 0.122s ( 0.128s elapsed)
EXIT time 0.000s ( 0.002s elapsed)
Total time 0.702s ( 0.738s elapsed)
%GC time 17.3% (17.3% elapsed)
Alloc rate 2,209,576,763 bytes per MUT second
Productivity 82.7% of total user, 78.7% of total elapsed
real 0m0.754s
user 0m0.704s
sys 0m0.037s
which has 1,280,306,112 bytes allocated in the heap
that in a ball (2x) expected figure. Btw on GHC 7.8 this allocates 353,480,272,096 bytes and works at an absolute age as popCount
not very efficient for GHC 7.8s Natural
.
EDIT : I changed the code a bit. In the original version, every other vector was 0
in stock. This gave much better distribution rates for the version Natural
. I changed it so that the vector alternates between different representations (with many bits) and now we see the 2x
distributions expected. This is another disadvantage Natural
(s Integer
): the rate of distribution depends on the values.
But maybe we can do better, let's try the tightly packed Data.Vector.Unboxed.Bit
:
This is initialVec :: BV.Vector BV.Bit
both re-compiled and re-run with the same parameters.
$ time ./MemStuff +RTS -sstderr
folded bit count: 192, sum: 1920000000
75,120,306,536 bytes allocated in the heap
54,914,640 bytes copied during GC
80,107,368 bytes maximum residency (2 sample(s))
664,128 bytes maximum slop
78 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 145985 colls, 0 par 0.543s 0.627s 0.0000s 0.0577s
Gen 1 2 colls, 0 par 0.065s 0.070s 0.0351s 0.0686s
INIT time 0.000s ( 0.000s elapsed)
MUT time 27.679s ( 28.228s elapsed)
GC time 0.608s ( 0.698s elapsed)
EXIT time 0.000s ( 0.002s elapsed)
Total time 28.288s ( 28.928s elapsed)
%GC time 2.1% (2.4% elapsed)
Alloc rate 2,714,015,097 bytes per MUT second
Productivity 97.8% of total user, 95.7% of total elapsed
real 0m28.944s
user 0m28.290s
sys 0m0.456s
This is very slow and about 100 times the distribution: (.
So let's recompile and project both runs ( ghc --make -rtsopts -O3 -prof -auto-all -caf-all -fforce-recomp MemStuff.hs
):
Version Natural
:
COST CENTRE MODULE %time %alloc
main.xorFolded Main 51.7 76.0
main.sumBitCounts.\ Main 25.4 16.0
main.sumBitCounts Main 12.1 0.0
main.lotsOfVecs Main 10.4 8.0
Version Data.Vector.Unboxed.Bit
:
COST CENTRE MODULE %time %alloc
soqoXOR Main 96.7 99.3
main.sumBitCounts.\ Main 1.9 0.2
Is it really the Natural
best option for a fixed size bit vector? What about GHC 6.8? And is there anything better that my type class can implement SOQuestionOps
?
source to share
Check out the module Data.LargeWord
in the package Crypto
:
http://hackage.haskell.org/package/Crypto-4.2.5.1/docs/Data-LargeWord.html
It provides instances Bits
for large words of different sizes eg. 96 to 256 bits.
source to share