Learning Haskell `parsec`: Trying to rewrite the` words` function as a main exercise
This is a hugely important question and I honestly feel a little silly writing it.
TL; DR: How do I write a function that uses a library parsec
to simulate the behavior of a function words
from Data.List
? Example of intended behavior:
wordsReplica "I love lamp" = ["I","love","lamp"]
I just read the first couple of pages of the Parsec chapter from Real World Haskell
, and it would be incredibly helpful to understand what a minimal parsing function is (one that does more than returns an argument or returns nothing). (An introductory RWH example shows how to parse a multi-line CSV file ...)
As such, I thought it would be helpful, a basic exercise to rewrite words
with parsec
... This turned out to be less basic (for me) ...
The next is my attempt; unfortunately it generates an "unexpected end of input" error (at runtime) no matter what I give it. I've tried reading the descriptions / definitions of simple functions in the library parsec
at haskell.org, but they are not illustrative, at least for those who haven't done disassembly before, including in other languages.
testParser :: String -> Either ParseError [[String]]
testParser input = parse dcParser "(unknown)" input
where
wordsReplica = endBy
(sepBy
(many (noneOf " "))
(char ' '))
(char ' ')
(Sorry for the lisp -y, no dot specified - when I find out about a new function it helps me if I make the entry / structure super explicit.)
Update:
Here's something that's a step in the right direction (but still not quite there as it doesn't make a number):
λ: let wordsReplica = sepBy (many letter) (char ' ')
λ: parse wordsReplica "i love lamp 867 5309"
Right ["i","love","lamp",""]
Update 2:
It looks like this function does the job, although I'm not sure how idiomatic it is:
λ: let wordsReplica = sepBy (many (satisfy(not . isSpace))) (char ' ')
wordsReplica :: Stream s m Char => ParsecT s u m [[Char]]
λ: parse wordsReplica "" "867 5309 i love lamp %all% !(nonblanks are $$captured$$"
Right ["867","5309","i","love","lamp","%all%","!(nonblanks","are","$$captured$$"]
it :: Either ParseError [[Char]]
source to share
Update 2:
It looks like this function does the job, although I'm not sure how idiomatic it is.
This is good, but it doesn't work the way you plan:
> words "Hello world" ["Hello", "world"] > parse wordsReplica "" "Hello world" Right ["Hello", "", "", "", "", "", "world"]
Not exactly what you want. After all, a word must contain at least one character. But if you change many
to many1
, you will notice another error:
> parse wordsReplicaMany1 "" "Hello world" Left (line 1, column 7): unexpected ""
This is because your split parser is not greedy enough. Instead of parsing a single space, parse as much as you can :
nonSpace = satisfy $ not . isSpace wordsReplica' = many1 nonSpace `sepBy` spaces
source to share