Error using Flex (lex) and Bison (yacc)

From the Bison manual:

In a simple interactive command parser where each input is a single line, it can be enough to let yyparse return 1 on error and have the caller ignore the rest of the input line when it does (and then call yyparse again).

This is pretty much what I want, but I'm having trouble getting it to work. Basically, I want to detect errors in flex as well, and if an error is found, Bison will discard the entire line. What I have now is not working like this, because my commands are still being executed:

kbsh: ls '/home
Error: Unterminated Single Quote
admin  kbrandt  tempuser
syntax error
kbsh: 

      

In my Bison file:

commands:
     /*Empty*/ { prompt(); } |
     command { prompt(); }
    ;

command:
    error {return 1; } |
    chdir_command |
    pwd_command |
    exit_command |
    WORD arg_list {
        execute_command($1, $2);
        //printf("%s, %s\n", $1, $2); 
    } |
    WORD { execute_command($1, NULL); }
    ;

      

And in my Flex:

'   {BEGIN inQuote; }

<inQuote>\n {printf("Error: Unterminated Single Quote\n"); BEGIN(0); return(ERROR);}

      

+2


source to share


1 answer


I don't think you will find an easy solution to handle these types of parsing errors in lexer.

I would make lexer (flex / lex) as dumb as possible, it should just provide a stream of core tokens (ids, keywords, etc.) and parse the parser (yacc / bison), In fact, this is the setting of exactly that what you want, with a slight tweak to your approach ...

In lexer (parser.l) keep it simple (no eol / newline handling), something like (not complete):

}%

/* I don't recall if the backslashify is required below */
SINGLE_QUOTE_STRING \'.*\'
DOUBLE_QUOTE_STRING \".*\"

%%
{SINGLE_QUOTE_STRING} {
    yylval.charstr = copy_to_tmp_buffer(yytext);  // implies a %union
    return STRING;
}
{DOUBLE_QUOTE_STRING} {
    yylval.charstr = copy_to_tmp_buffer(yytext);  // implies a %union
    return STRING;
}
\n   return NEWLINE;

      

Then, in your parser.y file, do all the real processing (it's not a complete thing):

command:
    error NEWLINE
        { yyclearin; yyerrorok; print_the_next_command_prompt(); }
    | chdir_command STRING NEWLINE
        { do_the_chdir($<charstr>2); print_the_next_command_prompt(); }
    | ... and so on ...

      



There are two things here:

  • Offsetting things like NEWLINE to the yacc side so you can tell when the user has executed a command, then you can clear things up and start over (assuming you have " int yywrap() {return 1;}

    " somewhere). If you try to spot it too early in flex, when do you know to raise a bug?
  • chdir is not a single command (unless it was subordinated and you just showed it), it now has chdir_command STRING (chdir argument). This makes it so the parser can figure out what went wrong, you can yyerror if that directory doesn't exist, etc.

So you should end up with something like (guess what chdir might look like):

cd 'some_directory
  syntax error
  cd' some_directory '
  you are in some_directory dude!

And this is all handled by the yacc grammar, not the tokenizer.

I've found that keeping flexibility as simple as possible gives you the most flexibility ***. :)

+6


source







All Articles