Continuing lex after an error occurs
I am taking a course on compilers at my university. I am selecting a project using Haskell + Parsec. The lexer and parser must be separate. I am using Parsec to convert a string to a token list, which will then be passed to another Parsec parser that converts the token list to AST.
The problem is that the lexer has to keep trying lex, even in the event of an error. To try and do this, I entered a token representing the "unexpected token" for my Token datatype, and I tried to rotate my code with <| > unexpected to generate this token on error. It's a lot of templates, and it can also be difficult to know where to place them.
My preferred solution would be for somehow Parsec to automatically do this: if ever a ParseError, produce an unexpected token at that position and continue parsing one position later. How should I do it?
Here is the piece of code I have now: http://lpaste.net/8144414997276000256 For some reason, I can still get the parsing error, although the Unexpected Token should catch unhandled cases.
source to share
It seems like you should get away with one additional term
. I am assuming you have a type
that looks something like this:
token' = number <|> identifier <|> ...
I would probably have each token (
... etc.) manage its own spaces:
number :: Parser Token number = Number . read <$> many1 digit <* spaces
Why don't you add an extra unexpected term as catch-all at the end of this?
token' = number <|> identifier <|> ... <|> unexpected'
Whether he uses one character. You can even include a symbol in the value to improve error messages. Then when you use this to create a list, you will get the value
for each character that your lexer doesn't know what to do with it.
unexpected' :: Parser Token unexpected' = Unexpected <$ anyChar
Finally, all lex is simple
. In my tests, this works fine with invalid characters in the middle.
*Main> parse (many token') "<foo>" "1 2 abc ~ ~def" Right [Number 1,Number 2,Identifier "abc",Unexpected,Unexpected,Unexpected,Identifier "def"]
Note that Parsec does not return by default . This means that if the parse does not parse part of the path through the token, it will not return and try
: instead, you will get an error. To enable backtracking, you must use
in the parser, which may be a bug. For example, if
two characters are required:
identifier :: Parser Token identifier = Identifier <$> liftA2 (:) letter (many1 alphaNum) <* spaces
Then it can fail partially, and not back. But if you wrap it in
, it should work:
token' = number <|> try identifier <|> ...
The problem with this
is that it can slow down your code if you're not careful. However, if you don't mind slowing down, you can get away by simply adding
everywhere and discarding a lot!
source to share