How to make variable replace with Flex / Lex and Yacc / Bison

Wikipedia's definition of interpolation I am just learning about flex / bison and I am writing my own wrapper with it. I am trying to find a good way to do variable interpolation. My initial approach to this was to have a flexible scan for something like ~ for my home directory or $ myVar, and then set what yyval.stringto is, which is what the search function returns. My problem is that it doesn't help me when one token appears in the text:

kbsh:/home/kbrandt% echo ~
/home/kbrandt
kbsh:/home/kbrandt% echo ~/foo
/home/kbrandt /foo
kbsh:/home/kbrandt%

      

Definition of lex for variables:

\$[a-zA-Z/0-9_]+    {
    yylval.string=return_value(&variables, (yytext + sizeof(char)));;
    return(WORD);
}

      

Then in my grammar I have things like:

chdir_command:
    CD WORD { change_dir($2); }
    ;

      

Does anyone know a good way to handle this kind of thing? Am I doing it all wrong?

+2


source to share


2 answers


Lex / yacc makes it difficult to handle "traditional" shells with things like variable substitution. What they do is more like expanding macros, where AFTER expanding a variable, they then re-token the input without expanding further variables. For example, an input such as "xx $ {$ foo}" where "foo" is defined as "bar" and "bar" is defined as "$ y" will expand to "xx $ y" which will be treated as a single word (and $ y will NOT be expanded).

You can handle this in flex, but you need a lot of support code. You need to use the flex stuff yy_buffer_state to sometimes redirect the output to a buffer that you then scan, and carefully use initial states to control when variables may and may not be expanded.

It's probably easier to use a very simple lexer that returns tokens like ALPHA (one or more alphabetic characters), NUMERIC (one or more digits) or WHITESPACE (one or more spaces or tabs) and collect their parser accordingly and you get rules such as:



simple_command: wordlist NEWLINE ;

wordlist: word | wordlist WHITESPACE word ;

word: word_frag
    | word word_frag { $$ = concat_string($1, $2); }
;

word_frag: single_quote_string
         | double_quote_string
         | variable
         | ALPHA
         | NUMERIC
        ...more options...
;

variable: '$' name { $$ = lookup($2); }
        | '$' '{' word '}' { $$ = lookup($3); }
        | '$' '{' word ':' ....

      

as you can see, this is quite complicated.

+4


source


Usually looks ok


I'm not sure what it does return_value

, I hope it will be strdup(3)

a variable name because it yytext

is just a buffer.



If you're asking about the division of labor between lex and parse, I'm sure it's perfectly reasonable to push the macro processor and parameter substitution into the scanner, and just use your grammar with WORD

s, lists, commands, pipes, redirects, etc. After all, it would be smart enough, at least not in style, and perhaps beat the point of your exercise, to do everything with code.

I really think that making a terminal symbol cd

or chdir

and using it in a grammatical way is ... not the best design decision. Just because a command is built-in doesn't mean that it should be displayed normally. Go ahead and disassemble cd

it chdir

like any other team. Check out inline semantics as action, not production.

After all, what if it is overridden as a shell procedure?

+1


source







All Articles