Attoparsec: Jump up to (but not including) the multi-char delimiter

I have a string that can contain almost any character. There is a separator inside the line {{{

.

For example: afskjdfakjsdfkjas{{{fasdf

.

Using attoparsec, what is the idiomatic way of notation Parser ()

that skips all characters up to {{{

but not consuming {{{

?

+3


source to share


2 answers


Use attoparsec

lookAhead

(which the parser applies without consuming any input) and manyTill

to write a parser that consumes everything up to (but excluding) the delimiter {{{

. Then you can apply that parser and throw it away.

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative ( (<|>) )
import Data.Text ( Text )
import qualified Data.Text as T
import Data.Attoparsec.Text
import Data.Attoparsec.Combinator ( lookAhead, manyTill )

myParser :: Parser Text
myParser = T.concat <$> manyTill (nonOpBraceSpan <|> opBraceSpan)
                                 (lookAhead $ string "{{{")
                    <?> "{{{"
  where
    opBraceSpan    = takeWhile1 (== '{')
    nonOpBraceSpan = takeWhile1 (/= '{')

      



In GHCi:

λ> :set -XOverloadedStrings 
λ> parseTest myParser "{foo{{bar{{{baz"
Done "{{{baz" "{foo{{bar"

      

+1


source


You can make it a little more complicated:

foo = many $ do
  Just c <- fmap (const Nothing) (try $ string "{{{") <|> fmap Just anyChar
  return c

      



Or, you can use this helper function manyTill

like this:

foo = manyTill anyChar (try $ string "{{{")

      

0


source







All Articles