Learning Haskell `parsec`: Trying to rewrite the` words` function as a main exercise

This is a hugely important question and I honestly feel a little silly writing it.

TL; DR: How do I write a function that uses a library parsec

to simulate the behavior of a function words

from Data.List

? Example of intended behavior:

wordsReplica "I love lamp" = ["I","love","lamp"]

      


I just read the first couple of pages of the Parsec chapter from Real World Haskell

, and it would be incredibly helpful to understand what a minimal parsing function is (one that does more than returns an argument or returns nothing). (An introductory RWH example shows how to parse a multi-line CSV file ...)

As such, I thought it would be helpful, a basic exercise to rewrite words

with parsec

... This turned out to be less basic (for me) ...

The next is my attempt; unfortunately it generates an "unexpected end of input" error (at runtime) no matter what I give it. I've tried reading the descriptions / definitions of simple functions in the library parsec

at haskell.org, but they are not illustrative, at least for those who haven't done disassembly before, including in other languages.

testParser :: String -> Either ParseError [[String]]
testParser input = parse dcParser "(unknown)" input
  where
    wordsReplica = endBy 
                    (sepBy 
                      (many (noneOf " "))
                      (char ' '))
                    (char ' ')

      

(Sorry for the lisp -y, no dot specified - when I find out about a new function it helps me if I make the entry / structure super explicit.)

Update:
Here's something that's a step in the right direction (but still not quite there as it doesn't make a number):

λ: let wordsReplica = sepBy (many letter) (char ' ')
λ: parse wordsReplica "i love lamp 867 5309"
Right ["i","love","lamp",""]

      

Update 2:

It looks like this function does the job, although I'm not sure how idiomatic it is:

λ: let wordsReplica = sepBy (many (satisfy(not . isSpace))) (char ' ')
wordsReplica :: Stream s m Char => ParsecT s u m [[Char]]

λ: parse wordsReplica "" "867 5309 i love lamp %all% !(nonblanks are $$captured$$"

Right ["867","5309","i","love","lamp","%all%","!(nonblanks","are","$$captured$$"]
it :: Either ParseError [[Char]]

      

+3


source to share


1 answer


Update 2:

It looks like this function does the job, although I'm not sure how idiomatic it is.

This is good, but it doesn't work the way you plan:

> words "Hello world"
["Hello", "world"]

> parse wordsReplica "" "Hello world"
Right ["Hello", "", "", "", "", "", "world"]

Not exactly what you want. After all, a word must contain at least one character. But if you change many

to many1

, you will notice another error:



> parse wordsReplicaMany1 "" "Hello world"
Left (line 1, column 7):
unexpected ""

This is because your split parser is not greedy enough. Instead of parsing a single space, parse as much as you can :

nonSpace      = satisfy $ not . isSpace
wordsReplica' = many1 nonSpace `sepBy` spaces

      

+2


source







All Articles