Write a compiler for a language that looks ahead and multiple files?

In my language, I can use a class variable in my method when the definition appears below the method. It can also call methods below my method, etc. There are no "headers". Take this C # example.

class A
{
    public void callMethods() { print(); B b; b.notYetSeen();
    public void print() { Console.Write("v = {0}", v); }
    int v=9;
}

class B
{
    public void notYetSeen() { Console.Write("notYetSeen()\n"); }
}

      

How do I compile this? what i thought:

  • pass1: convert everything to AST
  • pass2: go through all classes and create a list of class definitions / variables / etc
  • pass3: go through the code and check if there are any errors such as variable undefined, misuse, etc. and create my output

But it seems to me that for this I need to do pass 1 and 2 for ALL files before doing pass3. It also seems to me that there is a lot of work to do until I find a syntax error (other than the obvious one that can be done during parsing, such as forgetting to close the parenthesis or writing 0xLETTERS instead of a hexadecimal value). My gut says there is another way.

Note. I am using bison / flex to generate my compiler.

+2


source to share


4 answers


My understanding of languages โ€‹โ€‹that handle direct references is that they usually just use the first pass to create a list of valid names. Something line by line just puts the record in the table (without populating the definition), so you need to specify something later when you do your real pass to generate the definitions.



If you try to actually build complete definitions as you go, you will have to re-scan, each time keeping references to undefined things until the next pass. Even if there were no circular references.

+3


source


I would go through one and collect all your names and types of your class / method / field, ignoring the method bodies. Then, in a pass, two only check the method bodies.



+1


source


I don't know what other way there might be than moving all files in the source.

I think you can get it up to two passes - on the first pass, build the AST and whenever you find a variable name add it to the list containing the characters of those blocks (it would probably be useful to add this list to the corresponding scope in tree). The second step is to traverse the tree linearly and make sure that every symbol you use refers to a symbol in that area or the area above it.

My description is oversimplified, but the main answer is that lookahead requires at least two passes.

0


source


The usual approach is to save B

as "unknown". It's probably some sort of type (due to where you ran into it). Thus, you can simply reserve memory (pointer) for it, even if you don't know what it really is.

There is not much you can do to call a method. In a dynamic language, you just store the method name somewhere and check if it exists at runtime. In a static language, you can store it under "unknown methods" somewhere in your compiler, along with the unknown type B

. Since method calls end up being translated to a memory address, you can reserve memory again.

Then when you come across a B

and method, you can clear your unknowns. Since you know a little about them, you can tell if they behave the way they should, or if the first use is now a syntax error.

So you don't need to read all files twice, but it certainly makes things easier.

Alternatively, you can generate these header files when you come across sources and save them somewhere where you can find them again. This way you can speed up compilation (since you don't have to consider immutable files in the next compilation run).

Finally, if you are writing a new language, you should no longer use bison and flex. There are much better tools by now. ANTLR , for example, can create a parser that can recover from an error, so you can still parse the entire file. Or check this Wikipedia article for more options.

0


source







All Articles