Parsing multiple lines into a list of lists in Haskell

Question

Parsing multiple lines into a list of lists in Haskell

I am trying to parse a file that looks like this:

a b c 
f e d

I want to match each of the characters in a string and parse everything into a list of lists, for example:

[[A, B, C], [D, E, F]]

To do this, I tried the following:

import           Control.Monad
import           Text.ParserCombinators.Parsec
import           Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token    as P

parserP :: Parser [[MyType]]
parserP = do
  x  <- rowP
  xs <- many (newline >> rowP)
  return (x : xs)

rowP :: Parser [MyType]
rowP = manyTill cellP $ void newline <|> eof

cellP :: Parser (Cell Color)
cellP = aP <|> bP <|> ... -- rest of the parsers, they all look very similar

aP :: Parser MyType
aP = symbol "a" >> return A

bP :: Parser MyType
bP = symbol "b" >> return B

lexer = P.makeTokenParser emptyDef
symbol  = P.symbol lexer

But it cannot return multiple internal lists. Instead, I get:

[[A, B, C, D, E, F]]

What am I doing wrong? I expected many to parse cellP to newline, but this is not the case.

+3

parsing haskell parsec

Jesuspc June 29. 17 at 20:49

source to share

2 answers

Parser Combo Components are too easy for this. I would use lines :: String -> [String]

andwords :: String -> [String]

to split the input and then map the individual tokens to MyType

s.

toMyType :: String -> Maybe MyType
toMyType "a" = Just A
toMyType "b" = Just B
toMyType "c" = Just C
toMyType _ = Nothing

parseMyType :: String -> Maybe [[MyType]]
parseMyType = traverse (traverse toMyType) . fmap words . lines

+5

Benjamin Hodgson June 29. 17 at 21:28

source to share

Silvio mayolo · Accepted Answer · 2017-06-29T21:35:36+0000

You are correct that it manyTill

continues parsing to a new line. But he manyTill

will never see a new line, because he is cellP

too impatient. cellP

ends with a call P.symbol

whose documentation contains

symbol :: String -> ParsecT s u m String

The Lexeme parser character s parses the string s and skips the trailing space.

There is "white space" in the keyword. It turns out that Parsec defines whitespace as any character it satisfies isSpace

, which includes newlines. So P.symbol

happily consumes c

, followed by a space and manyTill

a newline, and then looks and doesn't see the newline because it's already consumed.

If you would like to opt out of the Parsec procedure, go to Benjamin's solution. But if you adhere strongly to it, the basic idea is that you want to change the language field whiteSpace

to correctly define whitespace so that they are not new. Something like

lexer = let lexer0 = P.makeTokenParser emptyDef
        in lexer0 { whiteSpace = void $ many (oneOf " \t") }

This pseudocode and probably won't work for your specific case, but there is an idea. You want to change the definition to whiteSpace

what you want to define as whiteSpace

, not what the system defines by default. Note that changing this will also break the comment syntax if you have a specific one as it was whiteSpace

previously equipped to handle comments.

In short, Benjamin's answer is probably the best way to go. There is no real reason to use Parsec here. But it's also good to know why this particular solution didn't help: the default Parsec language definition was not intended to make newline references meaningful.

Parsing multiple lines into a list of lists in Haskell

More articles: