How can I write a more general (but efficient) version of attoparsec takeWhile1?
Data.Attoparsec.Text
exports takeWhile
and takeWhile1
:
takeWhile :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns
True
and returns consumed input.This analyzer is not interrupted. It will return an empty string if the predicate returns
False
for the first character of the input.[...]
takeWhile1 :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns
True
and returns consumed input.This parser requires the predicate to succeed on at least one input character: it will fail if the predicate never returns
True
or if there is no left input.
attoparsec
the documentation encourages the user to
Use
Text
oriented parsers whenever possible, eg.takeWhile1
instead ofmany1 anyChar
. The difference in performance between the two types of parser is about 100.
These two parsers are very useful, but I always feel the need for a more general version takeWhile1
, or rather, some hypothetical parser
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text takeWhileLo f lo = undefined
which will parse at least lo
characters that satisfy the predicate f
, where lo
is an arbitrary non-negative integer.
I've looked over the takeWhile1
implementation but uses a bunch of functions private to Data.Attoparsec.Text.Internal
and doesn't seem to be easy to generalize.
I came up with the following application implementation:
{-# LANGUAGE OverloadedStrings #-} import Prelude hiding ( takeWhile ) import Control.Applicative ( (<*>) ) import Data.Text ( Text ) import qualified Data.Text as T import Data.Attoparsec.Text takeWhileLo :: (Char -> Bool) -> Int -> Parser Text takeWhileLo f lo = T.append . T.pack <$> count lo (satisfy f) <*> takeWhile f
It works as advertised,
λ> parseOnly (takeWhileLo (== 'a') 4) "aaa" Left "not enough input" λ> parseOnly (takeWhileLo (== 'a') 4) "aaaa" Right "aaaa" λ> parseOnly (takeWhileLo (== 'a') 4) "aaaaaaaaaaaaa" Right "aaaaaaaaaaaaa"
but the need to package an intermediate list of results returned count
bothers me, especially for cases where lo
large ... Seems against recommendation
use oriented parsers whenever possible [...]
Am I missing something? Is there a more efficient / idiomatic way to implement such a combinator takeWhileLo
?
source to share
Parser
is a monad, so you can just check the return value and fail if the length is not correct:
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = do
text <- takeWhile f
case T.compareLength text lo of
LT -> empty
_ -> return text
compareLength
is a package text
. This is more efficient than comparing lengths text
as it compareLength
can be short-circuited.
source to share