How can I create my own JVM oriented programming language?
I would like to create my own JVM oriented programming language. I'm not sure how to do this. Should I create my own compiler? Do all programming languages have unique compilers or are there existing ones that can be adapted?
I found some information on targeting the .NET CLI .
I also found Dragon Book in compiler design.
source to share
Yes, each language has its own compiler. There are several types of compilers that can be written, each becoming more complex and building on the previous one:
- only answers whether the syntax is a valid input source,
- creates an internal representation of the input source (called the AST - abstract),
- (generates a translated input form),
- an optimizing compiler, 3 but optimizes the AST before generating the output.
All of these forms of compiler usually use tools specially designed to help with different stages of compilation. In short:
Parsing . I would recommend parboiled for Java. The older tools were usually variants of lex and yacc, two unix tools for lexical and grammatical parsing. ANTLR and Javacc are two examples that run on the JVM; however, the steam fiber is amazing.
AST : I don't know of any tool here, it is possible to reuse a model from another JVM language like javac, but I personally would create this myself.
Generating output . A fast approach is to generate Java source code which has some limitations but is a great approach for water testing overall. When / if you decide to move on to generating JVM bytecodes, a collection of helper libraries can be found here. However, you need to know a lot about the JVM before you try this route, the JVM spec / book from Oracle is a must read.
For general knowledge, the llvm tutorial is excellent, it is rather short and very well written. I know you said you wanted to target the JVM, however almost everything this tutorial covers will help you understand the details you need.
I would recommend the following tutorial to you and rewrite it with Java. His steps are very logical. As such, one could write a recognizer for a very simple language such as "1 + 2". Then write a translator for that language. This would be a very reasonable stopping point, many languages are interpreted; Java also began its life. Optionally, you can continue to emit the target output, such as Java source code. The code for this would be pretty short and will give you faster feedback than trying to write any one layer first. There are many opportunities to use your coding hours if you have gone down this road.
source to share
Chris K. gave a pretty good answer, however, at some point I (as someone who at least already wrote a working compiler for a non-trivial JVM language) should strongly disagree:
A code generator should only generate Java (or Scala, Ceylon, Kotlin, Clojure, ... whatever you like) code at the beginning for the following reasons:
- other tasks (lexing, parsing, saving compiler state, aka symbol table, semantic analysis, etc.) are already quite demanding. Therefore, studying another library will be overdone and will significantly delay your first results.
- Once you have everything, including generating the code and compiling the first program, you will find that your compiler is buggy , literally. It is much easier to see that these errors show up in insensitive or erronic Java code rather than in erronous files. Would you rather get the cryptic message from the bytecode verifier or look at the generated code in plain text?)
- Code generation should be a separate module anyway, nothing in the rest of the compiler does (or should) depend on code generation. Therefore, it is easy to replace it as soon as you can be sure that your compiler can actually understand the meaning of its input (proof of which is the compiled Java code that stands on some tests, etc.). Of course, as long as the generated class file is not 100% fool proof, it should be an option whether the code is generated in Java or binary. This way you can compile test programs for Java and bytecode and run tests with both results. This makes error parsing in cases where the generated class file is suddenly not easy.)
I personally wouldn't even generate class files until your compiler, written in your own language, can compile itself in java, and the resulting program can compile the compiler source just like Java code.
source to share