Xtext: relationship between AST, metamodel and parse tree

Can anyone explain to me the relationship between syntax tree, AST and metamodel. I know xtext gets the EMF Ecore metamodel from the grammar and generates a parser with antlr. But how will it be parsed: the input goes first through the lexer, and then the parser creates a syntax tree from the parser rules, right? And from the syntax tree Xtext creates an AST as well? For what? And what is the purpose of the metamodel in this case? I am a bit confused about all definitions.

+3


source to share


1 answer


You are correct in the 3-step parsing procedure: first the lexer starts from the input stream, then the Antl-based parse tree is created, and finally the Xtext generates the EMF-based AST from the parse tree. The first two steps are natural for every analyzer (generator), the third step requires explanation. I'll start with a little long explanation with some motivation, then I'll talk about metamodels and EMF in general shortly.

First of all, the generated parsers do not support identifier resolution (required for processing variables or function calls), these functions need to be added manually, so the post-processing step is manually coded for almost all languages ​​that need to extend an already existing parse tree.

Second, EMF provides a good, type-safe API for its models, as well as a powerful reflective API that allows you to create very general but useful components that make it easier to handle models (for example, code generators like Acceleo or one aspect of Xtend . model transformation tools such as ATL , ETL , VIATRA2 ). I can't pinpoint the exact difference between the Antlr parse tree API and the EMF, but I've worked with the LPG parser generator API and EMF is easier to work with in my opinion.

Better yet, using EMF allows the rich functionality of Xtext to be reused along with other EMF-based editors such as GMF-based graphics editors. See the earlier EclipseCon presentation for the main idea: TMF meets GMF - Combining Text and Graphics Modeling .



In general, if we need to extend our syntax tree with permission information, then reusing an already used paradigm can make it easier to integrate our language with other tools.

EMF relies on the concept of metamodeling: we must define the set of elements used in the models, along with additional constraints such as connectivity information. This concept is similar to schema definitions for XML (such as DTD or SML Schema) - we have a consistent way of describing models. Xtext works with EMF in several ways:

  • First of all, based on the grammar, it generates and registers an EMF metamodel that can be used in every EMF-based tool.
  • The end result of the parsing process is then an EMF model that can be read and modified using the EMF API. The changes are converted to text.

Hope the answer was clear enough. Feel free to ask for further clarification if needed.

+3


source







All Articles