How to print parser tree in Yacc (BISON)

I made a parser for the C language using BISON and FlEX. It works and prints a "syntax error" in the terminal if the given input c-code is syntactically invalid, otherwise don't print anything.

But I want to print the parser tree relative to the given input c-code as the output of my parser. How should I do it? Is there a function in BISON that can be used to print a parser tree?

+3


source to share


1 answer


The TXR language (http://www.nongnu.org/txr) uses Flex and Yacc to parse its input. You can see the parse tree if you give it an option -v

.

eg:.

$ ./txr -v -c "@/[a-z]*|foo/"
spec:
(((text (#<sys:regex: 9d99268> or (0+ (set (#\a . #\z))) (compound #\f #\o #\o)))))

      

You create a tree in parser actions and print it yourself using the tree printing routine. I've used a Lisp-like representation to make life easier. This document is written using the recursive print function, which recognizes all possible types of objects and displays them in notation. For example, above, you see character type objects printed using the hash backslash notation, and the non-printable opaque compiled regex is printed using the notation #< ... >

.



Here's some of the grammar:

regexpr : regbranch                     { $$ = if3(cdr($1), 
                                                   cons(compound_s, $1),
                                                   car($1)); }
        | regexpr '|' regexpr           { $$ = list(or_s, $1, $3, nao); }
        | regexpr '&' regexpr           { $$ = list(and_s, $1, $3, nao); }
        | '~' regexpr                   { $$ = list(compl_s, $2, nao); }
        | /* empty */ %prec LOW         { $$ = nil; }
        ;

      

As you can see, building an AST is just simple building of nested lists. This form is very easy to compile. The top level function of the NFA based regex compiler is very readable:

/*
 * Input is the items from a regex form,
 * not including the regex symbol.
 * I.e.  (rest '(regex ...)) not '(regex ...).
 */
static nfa_t nfa_compile_regex(val exp)
{
  if (nullp(exp)) {
    nfa_state_t *acc = nfa_state_accept();
    nfa_state_t *s = nfa_state_empty(acc, 0);
    return nfa_make(s, acc);
  } else if (typeof(exp) == chr_s) {
    nfa_state_t *acc = nfa_state_accept();
    nfa_state_t *s = nfa_state_single(acc, c_chr(exp));
    return nfa_make(s, acc);
  } else if (exp == wild_s) {
    nfa_state_t *acc = nfa_state_accept();
    nfa_state_t *s = nfa_state_wild(acc);
    return nfa_make(s, acc);
  } else {
    val sym = first(exp), args = rest(exp);

    if (sym == set_s) {
      return nfa_compile_set(args, nil);
    } else if (sym == cset_s) {
      return nfa_compile_set(args, t);
    } else if (sym == compound_s) {
      return nfa_compile_list(args);
    } else if (sym == zeroplus_s) {
      nfa_t nfa_arg = nfa_compile_regex(first(args));
      nfa_state_t *acc = nfa_state_accept();
      /* New start state has empty transitions going through
         the inner NFA, or skipping it right to the new acceptance state. */
      nfa_state_t *s = nfa_state_empty(nfa_arg.start, acc);
      /* Convert acceptance state of inner NFA to one which has
         an empty transition back to the start state, and
         an empty transition to the new acceptance state. */
      nfa_state_empty_convert(nfa_arg.accept, nfa_arg.start, acc);
      return nfa_make(s, acc);
    } else if (sym == oneplus_s) {
      /* One-plus case differs from zero-plus in that the new start state
         does not have an empty transition to the acceptance state.
         So the inner NFA must be traversed once. */
      nfa_t nfa_arg = nfa_compile_regex(first(args));
      nfa_state_t *acc = nfa_state_accept();
      nfa_state_t *s = nfa_state_empty(nfa_arg.start, 0); /* <-- diff */
      nfa_state_empty_convert(nfa_arg.accept, nfa_arg.start, acc);
      return nfa_make(s, acc);
    } else if (sym == optional_s) {
      /* In this case, we can keep the acceptance state of the inner
         NFA as the acceptance state of the new NFA. We simply add
         a new start state which can short-circuit to it via an empty
         transition.  */
      nfa_t nfa_arg = nfa_compile_regex(first(args));
      nfa_state_t *s = nfa_state_empty(nfa_arg.start, nfa_arg.accept);
      return nfa_make(s, nfa_arg.accept);
    } else if (sym == or_s) {
      /* Simple: make a new start and acceptance state, which form
         the ends of a spindle that goes through two branches. */
      nfa_t nfa_first = nfa_compile_regex(first(args));
      nfa_t nfa_second = nfa_compile_regex(second(args));
      nfa_state_t *acc = nfa_state_accept();
      /* New state s has empty transitions into each inner NFA. */
      nfa_state_t *s = nfa_state_empty(nfa_first.start, nfa_second.start);
      /* Acceptance state of each inner NFA converted to empty
         transition to new combined acceptance state. */
      nfa_state_empty_convert(nfa_first.accept, acc, 0);
      nfa_state_empty_convert(nfa_second.accept, acc, 0);
      return nfa_make(s, acc);
    } else {
      internal_error("bad operator in regex");
    }
  }
}

      

+1


source







All Articles