Parallel "inserts" into binary trie in Haskell

I have a list of n-bit words

type BitWord = [Bool]

      

and a trie that stores the word from top to bottom:

data Tree = Bs Tree Tree  -- Bs (zero_bit) (one_bit)
          | X -- incomplete word
          | B -- final bit of word

      

I have a function:

seenPreviously :: BitWord -> Tree -> (Tree,Bool)

      

The function passes through bits in BitWord

, and when passing through Tree

- to the left with a zero bit and vice versa. We return a new tree with this BitWord

"merged in" along with True if we needed to add subtrees at some point (ie BitWord

not already in the trie) and False otherwise.

I am mapping this function over [BitWord]

, passing the state of the tree as.

My question is this . Could this benefit from the parallelism offered by Control.Parallel? And if so, how can I reason about laziness and evaluation only for a weak head of normal form, etc.?

My instinct is that I can insert (effectively building a subtree) down the left branch while doing the same on the right branch as two independent threads. Something like:

parallelMapSt :: [ BitWords ] -> Tree -> [Bool]
parallelMapSt [] _ = []
parallelMapSt (w:ws) t = let (b,t') = seenPreviously w t
                             bs     = parralelMapSt ws t'
                          in t' `par` bs `pseq` (b:bs)

      

Stream estimation b

depends on some previously curved streams (the ones that belong BitWords

to that share the c prefix w

), but not all of them, so it seems like there is a way to work in parallel here, but I'm really not sure.

+2


source to share


3 answers


Looks like a great candidate for par

tree traversal ... As with binary trees. Try to write several programs of this type and measure the effect par

.



+4


source


Going back whether the word in the trie was unnecessarily ordering your program. If you really need this information, it will probably be difficult to parallelize efficiently.

However, if we can rephrase the problem a little so that the order and position of the inserts don't matter, the problem is pretty simple:



import Control.Parallel

data Tree = Bs Bool         -- ^ is an empty word inserted here?
               (Maybe Tree) -- ^ '0' subtree
               (Maybe Tree) -- ^ '1' subtree
     deriving Show

insertMany :: [[Bool]] -> Maybe Tree
insertMany []  = Nothing
insertMany xss = hasEnd `par` fs `par` ts `pseq` Just (Bs hasEnd fs ts)
 where
    hasEnd = any null xss
    fs = insertMany [ xs | False : xs <- xss]
    ts = insertMany [ xs | True  : xs <- xss]

      

I don't have multiple cores at the moment, so I can't test this, but it should scale well. We basically got a parallel radix sort in just a few lines - not too shabby!

+4


source


Why don't you just try and see? Program execution time with 1 thread and multiple, and see if there is a difference. Sparks are really cheap in Haskell, so don't worry if you create them.

+1


source







All Articles