Lab 2: Abstract Syntax Tree and ElaboratorLab OverviewIn this lab, you'll implement the abstract syntax tree for Tiger, and design and implement an elaborator (type checker among others) for the abstract syntax tree. This lab consists of two parts: in the first part, you'll implement an abstract syntax tree for MiniJava, by designing a bunch of Java classes, and also you'll add semantic actions to the parser to generate abstract syntax trees automatically. In the second part of the lab, you'll implement an elaborator for the MiniJava language, which will involve symbol table design, type checking, and translation, etc..Getting StartedFirst check out the start code for this lab:$ git commit -am 'my solution to lab1' $ git checkout -b Lab2 origin/Lab2which will commit your changes to Lab1 branch of the local Git repository and create and check out into a new Lab2 branch from the remote Lab2 branch. You will now need to merge the Lab1 into the Lab2: $ git merge Lab1In some cases, Git may not be able to figure out how to merge your changes with the new lab assignment (e.g. if you modified some of the code that is changed in the second lab assignment). In that case, the git merge command will report you which files are conflicted, and you should first resolve the conflict (by editing the relevant files) and then commit the resulting files with git commit -a: $ git commit -am 'lab2 init'Lab 2 contains the following new source files, which you should browse through: src/ast/*: abstract syntax tree data structure and visitors src/elaborator/*: files relevant to the elaborator Hand-in ProcedureWhen you finished your lab, zip you code and submit to the school's information system.Part A: Abstract Syntax Tree for MiniJavaThe sole purpose of a compiler is to process programs (which are being compiled). From the concept point of view, modern computers are based on the Von Neumann architecture, of which the most important feature is the stored program. That is, the program and data being processed must first be stored into the memory, before the processing task can proceed. Compilers are of no exceptions: the program being compiled must first be stored into the main memory, before the compiler can start to compile it.
And the question is how. Maybe the most
straightforward way to do this is to represent the program
being compiled as a string of character (the program text as
the programmer writes down), but this
approach has the obvious drawback that later phases of the
compiler would become more complicated because it's non-trivial
to recognize program phrase from the string (think how to
find out whether or a variable A much more convenient way to store, in memory, the program being compiled is to use some specialized data structures from the implementation language. Nevertheless to say, this approach is highly implementation language-dependent. For instance, the representation technique from using Java would be different from that using C or other implementation languages. Abstract Syntax Trees as Class HierarchyWe are implementing a compiler for MiniJava using the implementation language Java, so we use a so-called local class hierarchy technique from Java. Essentially, to represent a program written in MiniJava, we define a set of classes. These classes are grouped in this way: for any production ruleN -> a1 | a2 | ... | anthere will be an abstract base class CN for
that nonterminal N . And
for each production rule of that nonterminal, there will be
a sub-class inheriting from that abstract base class. For
instance, in the example above, there will be totally n
sub-classes for a1 to an .
As those classes will form a tree-like layout in memory, so those classes are called an abstract syntax tree data structure. We've offered you the classes for defining MiniJava abstract syntax trees, and you should familiarize yourself with these classes. Exercise 1.
Read the code in the package You may also read chapter 1 and 4 of the Tiger book to learn more of local class hierarchy design. Program Representation as TreesGiven the data structure definitions for the MiniJava syntax, one can (in principle) manually represent any MiniJava program in memory, though this method is too boring and error-prone to do in everyday programming practice, it does clearly illustrate what's going on under the hood. And also this is the dominant programming styles in some functional languages, like Lisp or Scheme.
In the source Exercise 2.
Read the code in the file Pretty Printing and Visitor PatternThe manual (or even automatic) approach of program representation is complex enough, and in order to gain confidence that the representation is correct respect to the original program text, we would like to double check. The standard way to do this is called pretty printing, which is a classical algorithmic research topic.Again, the implementation of pretty printing is highly implementation language-dependent. For the Java language, the standard way to accomplish this task is to use the so-called visitor pattern. Chapter 4 of the Tiger book gives an excellent introduction to the visitor pattern, you may also refer to some books on design pattern to understand how visitor pattern works. Exercise 3.
Read again the code in the
Compile and then run the Tiger compiler: $ java -cp bin Tiger -testFac
which will pretty print the program System.out.println (new Fac().ComputeFac(10, )); public int ComputeFac(int num, ) num_aux = num * this.ComputeFac(num - 1, );These are actually bugs, detect where these bugs are triggered and fix these bugs by modifying code. Don't forget to do regression tests. Exercise 4.
The current implementation of the pretty printer in
the file Challenge!
Write another pretty printer Challenge!
The pretty printer in the file Tree GenerationThe parser can construct the abstract syntax tree automaticall by it's semantic action attached with the parsing methods.Exercise 5.
Modfity the parser $ java -cp bin Tiger <inputFile> -dump astFix any bugs before you continue. Challenge! If you've done the LALR parser from the previous lab, then generate the abstract syntax trees from that parser by adding semantic actions. Part B: ElaboratorBefore continuing to do other operations on the abstract syntax trees, one must check these trees to ensure they are well-formed. By the terminology well-formed, we mean that the input MiniJava program must obey all constraint rules specified by the MiniJava language specification (in turn by the Java specification). Typical rules include: a variable must be declared first before its use; the"+" operators
must apply to two operands of integer type; the methods being
invoked on an object must have
been defined in its class or superclass; and so on. All such checking are preformed
by an elaborator in the Tiger compiler. Note that in the
literature, there are other names for the elaborator, say
type checker or semantic analyzer, but we will
call it an elaborator here in this lab.
Symbol TablesJust like the pretty printing, the process of elaboration involves some kind of (post-order) tree traversal. However, there is a key difference: the elaboration traversal is context-sensitive, thus one have to record necessary context information. One can use some tables to record these. For historical reason, these tables are often called symbol tables, for it associate symbols with their associated informations (such as types, scopes, etc..).The design of the symbol table is highly-dependent on the language being compiled. As for MiniJava, there are instance variables and methods in each class, so one need one big symbol table, which maps class name to its containing variables and methods; we call this table the class table. And each method contains parameters and locals variables, so one need a symbol table for each method which maps parameters or locals to their corresponding types (among other informations); we call this table the method table. Exercise 6.
Read the code in
It's often the case that one
need to dump the content the class table or method table
after they have been created, to make sure that they
contain everything one need. Now finish the methods $ java -cp bin Tiger -elab classTable -elab methodTable <inputFile> Elaboration RulesThe elaboration (type checking) is a post-order traversal of the abstract syntax tree. For sub-trees, the elaborator returns the type of that sub-tree; and for root-node, the elaborator compares the types returned from the sub-trees and guarantee the sub-tree types can be combined in some proper way. For instance, for the tree representing an addition node e1+e2, the elaborator first traverse e1 to obtain a type t1, and then e2 to obtain a type t2; at the root node, the elaborator will check that both t1 and t2 are int. Again, these checking rules are language-dependent, and for the Java language, these rules are well specified in the language specification.Exercise 7.
Finish the code in the elaborator Error Handling RecoveryIt's the elabortor's job to report all possible semantic errors to the programmer and to recover from the current error to check as many errors as possible. The error handling code in current implementation is very rudimentary in that it only reports some information and then aborts.Exercise 8.
Modify the error reporting method
Declaration Site and Use SitesLike many other language, Java obeys a declaration-before-use rule, which simply means that for each variable in the program, there should be one and only one declaration site for that variable. However, it's not true for the vice versa: for one variable declaration, there may be zero or more number of uses of that variable; if the number of the uses is zero, then there is the chance of locating an unused variable. A good compiler may generate some error message like this one:Warning: variable "x" declared at line 10 never usedjust as what your Eclipse has always been generating for your Java code. Exercise 9. Modify your Tiger compiler to emit warning messages like the one above. HandinThis completes the lab. Remember to hand in your solution to the information system. |