How can I disable the maximum munch rule in Lex?

Suppose I want to deal with certain patterns and have different text (VHDL code) as it appears in the output file.

For this purpose, I would have to write the main rule at the end as

(MY_PATTERN){
// do something with my pattern
}

(.*){
return TOK_VHDL_CODE;

}

      

The problem with this strategy is that MY_PATTERN is useless in this case and will be consistent with. * using the maximum munch rule.

So how can I get this functionality?

+3


source to share


2 answers


The easiest way is to get rid of *

in your default rule at the end and just use

.    { append_to_buffer(*yytext); }

      



so your rule by default takes all materials that do not match the previous rules and fills them in a buffer that needs to be processed by someone else.

+1


source


In theory, you can find a regular expression that will match a string that does not contain a pattern, but, except for very simple patterns, it is not easy or readable.

If all you want to do is find (and react to) specific patterns, you can use the default rule that matches a single character and does nothing:

{Pattern1}   { /* Do something with the pattern */ }
{Pattern2}   { /* Do something with the pattern */ }
.|\n         /* Default rule does nothing */

      

If, on the other hand, you wanted to do something with strings that do not match each other (as in your example), you would need to use the default rule to accumulate strings, and template rules to "send", (return) the accumulated token before acting on the token they agreed on. This means that some actions will need to send two tokens, which is a bit inconvenient with the standard architecture parser calls scanner for a token

because it requires the scanner to maintain some state.

If you don't have a too old version bison

, you can use "push parser" instead, which allows the scanner to invoke the parser. This makes it easier to send two tokens in one action. Otherwise, you need to create a kind of state machine in your scanner.



Below is a simple example (which, among other things, requires defining templates) using push-parser.

%{
  #include <stdlib.h>
  #include <string.h>
  #include "parser.tab.h"
  /* Since the lexer calls the parser and we call the lexer,
   * we pass through a parser (state) to the lexer. This is
   * how you change the `yylex` prototype:
   */
  #define YY_DECL static int yylex(yypstate* parser)
%}

pattern1   ...
pattern2   ...

/* Standard "avoid warnings" options */
%option noyywrap noinput nounput nodefault

%%
  /* Indented code before the first pattern is inserted at the beginning
   * of yylex, perfect for local variables.
   */
  size_t vhdl_length = 0;
  /* These are macros because they do out-of-sequence return on error. */
  /* If you don't like macros, please accept my apologies for the offense. */
  #define SEND_(toke, leng) do { \
    size_t leng_ = leng; \
    char* text = memmove(malloc(leng_ + 1), yytext, leng_); \
    text[leng_] = 0; \
    int status = yypush_parse(parser, toke, &text); \
    if (status != YYPUSH_MORE) return status; \
  } while(0);
  #define SEND_TOKEN(toke) SEND_(toke, yyleng)
  #define SEND_TEXT do if(vhdl_length){ \
    SEND_(TEXT, vhdl_length); \
    yytext += vhdl_length; yyleng -= vhdl_length; vhdl_length = 0; \
  } while(0);

{pattern1}   { SEND_TEXT; SEND_TOKEN(TOK_1); }
{pattern2}   { SEND_TEXT; SEND_TOKEN(TOK_2); }
  /* Default action just registers that we have one more char 
   * calls yymore() to keep accumulating the token.
   */
.|\n      { ++vhdl_length; yymore(); }
  /* In the push model, we're responsible for sending EOF to the parser */
<<EOF>>   { SEND_TEXT; return yypush_parse(parser, 0, 0); }

%%

/* In this model, the lexer drives everything, so we provide the
 * top-level interface here.
 */

int parse_vhdl(FILE* in) {
  yyin = in;
  /* Create a new pure push parser */
  yypstate* parser = yypstate_new();
  int status = yylex(parser);
  yypstate_delete(parser);
  return status;
}

      

To actually get this to work with bison, you need to provide a few additional options:

parser.y

%code requires {
  /* requires blocks get copied into the tab.h file */
  /* Don't do this if you prefer a %union declaration, of course */
  #define YYSTYPE char*
}
%code {
  #include <stdio.h>
  void yyerror(const char* msg) { fprintf(stderr, "%s\n", msg); }
}

%define api.pure full
%define api.push-pull push

      

+1


source







All Articles