How can I write a more general (but efficient) version of attoparsec takeWhile1?

Question

How can I write a more general (but efficient) version of attoparsec takeWhile1?

Data.Attoparsec.Text

exports takeWhile

and takeWhile1

:

takeWhile :: (Char -> Bool) -> Parser Text

      

        
        
        
      

    
Consume input as long as the predicate returns True

and returns consumed input.

This analyzer is not interrupted. It will return an empty string if the predicate returns False

for the first character of the input.

[...]
takeWhile1 :: (Char -> Bool) -> Parser Text

      

        
        
        
      

    
Consume input as long as the predicate returns True

and returns consumed input.

This parser requires the predicate to succeed on at least one input character: it will fail if the predicate never returns True

or if there is no left input.

attoparsec

the documentation encourages the user to

Use Text

oriented parsers whenever possible, eg. takeWhile1

instead of many1 anyChar

. The difference in performance between the two types of parser is about 100.

These two parsers are very useful, but I always feel the need for a more general version takeWhile1

, or rather, some hypothetical parser

takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = undefined

which will parse at least lo

characters that satisfy the predicate f

, where lo

is an arbitrary non-negative integer.

I've looked over the takeWhile1

implementation but uses a bunch of functions private to Data.Attoparsec.Text.Internal

and doesn't seem to be easy to generalize.

I came up with the following application implementation:

{-# LANGUAGE OverloadedStrings #-}

import           Prelude                  hiding ( takeWhile )

import           Control.Applicative             ( (<*>) )
import           Data.Text                       ( Text )
import qualified Data.Text           as T

import           Data.Attoparsec.Text

takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo =
  T.append . T.pack <$> count lo (satisfy f) <*> takeWhile f

It works as advertised,

λ> parseOnly (takeWhileLo (== 'a') 4) "aaa"
Left "not enough input"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaa"
Right "aaaa"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaaaaaaaaaaa"
Right "aaaaaaaaaaaaa"

but the need to package an intermediate list of results returned count

bothers me, especially for cases where lo

large ... Seems against recommendation

use oriented parsers whenever possible [...]

Am I missing something? Is there a more efficient / idiomatic way to implement such a combinator takeWhileLo

?

+3

haskell attoparsec parser-combinators

Jubobs June 30. 15 at 19:13

source to share

1 answer

András Kovács · Accepted Answer · 2015-06-30T20:05:34+0000

Parser

is a monad, so you can just check the return value and fail if the length is not correct:

takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = do
  text <- takeWhile f
  case T.compareLength text lo of
    LT -> empty
    _  -> return text

compareLength

is a package text

. This is more efficient than comparing lengths text

as it compareLength

can be short-circuited.

How can I write a more general (but efficient) version of attoparsec takeWhile1?

More articles: