Replacing macros during code generation
I currently have some legacy code that generates the op code. If the code has more macros then it takes so long to generate the code (in terms of hours!). I went through the logic, they process the macro by looking for it and doing substitution for every variable in it, like inlining for example. Is there a way that I can optimize it without manipulating the string?
source to share
You should use tokenization of your input before starting this kind of process. (I can't recommend the famous Dragon book highly enough - even the ancient edition has stood the test of time, the 2006 updated version looks great) Compilation is the kind of work that is best broken down into smaller steps: if your first phase does lexical analysis in tokens breaking strings into keywords, identifiers, constants, etc., it is much easier to find links to macros and see them in the symbol table. (Also, it's relatively easy to use a tool like lex or flex, or one of their modern equivalents to get the job done for you, rather than trying to do it from scratch.)
It seems like the "key" is if the code has more macros then it takes so long to generate the code. It looks like the process is linear in the number of macros, which is definitely too many. My guess is that this process happens one line at a time (if your language allows it, it obviously makes a huge difference since you don't have to treat the program as one huge line), and the pseudocode looks something like
for(each line in the program)
{
for(each macro definition)
{
test if the macro appears;
perform replacement if needed;
}
}
It scales well with the number of macros.
With tokenization, it looks something like this:
for(each line in the program)
{
tokenize the line;
for(each token in the line)
{
switch(based on the token type)
{
case(an identifier)
lookup the identifier in the table of macro names;
perform replacement as necessary;
....
}
}
}
which mostly scales in program size (not in the number of definitions) - symbol table lookups can of course be done with more optimal data structures than looping through them, so this no longer becomes a significant factor. This second step is that again programs like yacc and bison (and their more modern variants) can happily generate code.
afterthought: when parsing macro definitions, you can also save them as a stream token and mark the IDs that are "placeholder" names to replace the parameters. When expanding the macro, switch to this token. (Again, something like flex can easily do).
source to share
I have an application that has its own grammar. It supports all types of datatypes that a typical compiler supports (even macros). More precisely, it is a type of compiler that generates opcodes by taking a program (which is written using this grammar) as a program. To process macros, the text replacement logic is used For example:
Macro Add (a: int, b: int)
int c = a + b
End of the macro
// Program sum
..
int x = 10, y = 10;
Add (x, y);
..
// End of program
After replacement, it will
// Program sum
..
int x = 10, y = 10;
int c = x + y
..
// End of program
This text replacement takes so long, that is, it replaces the macro call with macro logic. Is there an optimal way to do this?
source to share
It's really hard to answer without knowing more about your preprocessor / parsing / compilation process. One idea would be to store the macro names in a symbol table. When parsing, first check the text markers against this table. If you find a match, write the replacement on a new line and run it through the parser, then continue parsing the original text following the close parens macro.
Depending on the syntax of the opcode, another idea might arise - when you come across macro definitions during parsing, generate opcodes, but place placeholders instead of arguments. Then, when the parser encounters macro calls, generate code to evaluate the arguments, and insert that code instead of placeholders in the pre-generated macro code.
source to share