Parallel "inserts" into binary trie in Haskell
I have a list of n-bit words
type BitWord = [Bool]
and a trie that stores the word from top to bottom:
data Tree = Bs Tree Tree -- Bs (zero_bit) (one_bit)
| X -- incomplete word
| B -- final bit of word
I have a function:
seenPreviously :: BitWord -> Tree -> (Tree,Bool)
The function passes through bits in BitWord
, and when passing through Tree
- to the left with a zero bit and vice versa. We return a new tree with this BitWord
"merged in" along with True if we needed to add subtrees at some point (ie BitWord
not already in the trie) and False otherwise.
I am mapping this function over [BitWord]
, passing the state of the tree as.
My question is this . Could this benefit from the parallelism offered by Control.Parallel? And if so, how can I reason about laziness and evaluation only for a weak head of normal form, etc.?
My instinct is that I can insert (effectively building a subtree) down the left branch while doing the same on the right branch as two independent threads. Something like:
parallelMapSt :: [ BitWords ] -> Tree -> [Bool]
parallelMapSt [] _ = []
parallelMapSt (w:ws) t = let (b,t') = seenPreviously w t
bs = parralelMapSt ws t'
in t' `par` bs `pseq` (b:bs)
Stream estimation b
depends on some previously curved streams (the ones that belong BitWords
to that share the c prefix w
), but not all of them, so it seems like there is a way to work in parallel here, but I'm really not sure.
source to share
Going back whether the word in the trie was unnecessarily ordering your program. If you really need this information, it will probably be difficult to parallelize efficiently.
However, if we can rephrase the problem a little so that the order and position of the inserts don't matter, the problem is pretty simple:
import Control.Parallel
data Tree = Bs Bool -- ^ is an empty word inserted here?
(Maybe Tree) -- ^ '0' subtree
(Maybe Tree) -- ^ '1' subtree
deriving Show
insertMany :: [[Bool]] -> Maybe Tree
insertMany [] = Nothing
insertMany xss = hasEnd `par` fs `par` ts `pseq` Just (Bs hasEnd fs ts)
where
hasEnd = any null xss
fs = insertMany [ xs | False : xs <- xss]
ts = insertMany [ xs | True : xs <- xss]
I don't have multiple cores at the moment, so I can't test this, but it should scale well. We basically got a parallel radix sort in just a few lines - not too shabby!