Lab 2: Abstract Syntax Tree and Elaborator

Lab Overview

In this lab, you'll implement the abstract syntax tree for Tiger, and design and implement an elaborator (type checker among others) for the abstract syntax tree. This lab consists of two parts: in the first part, you'll implement an abstract syntax tree for MiniJava, by designing a bunch of Java classes, and also you'll add semantic actions to the parser to generate abstract syntax trees automatically. In the second part of the lab, you'll implement an elaborator for the MiniJava language, which will involve symbol table design, type checking, and translation, etc..

Getting Started

First check out the start code for this lab:

  $ git commit -am 'my solution to lab1'
  $ git checkout -b Lab2 origin/Lab2

which will commit your changes to Lab1 branch of the local Git repository and create and check out into a new Lab2 branch from the remote Lab2 branch.

You will now need to merge the Lab1 into the Lab2:

  $ git merge Lab1

In some cases, Git may not be able to figure out how to merge your changes with the new lab assignment (e.g. if you modified some of the code that is changed in the second lab assignment). In that case, the git merge command will report you which files are conflicted, and you should first resolve the conflict (by editing the relevant files) and then commit the resulting files with git commit -a:

  $ git commit -am 'lab2 init'

Lab 2 contains the following new source files, which you should browse through:

  src/ast/*:        abstract syntax tree data structure and visitors
  src/elaborator/*: files relevant to the elaborator

Hand-in Procedure

When you finished your lab, zip you code and submit to the school's information system.

Part A: Abstract Syntax Tree for MiniJava

The sole purpose of a compiler is to process programs (which are being compiled). From the concept point of view, modern computers are based on the Von Neumann architecture, of which the most important feature is the stored program. That is, the program and data being processed must first be stored into the memory, before the processing task can proceed. Compilers are of no exceptions: the program being compiled must first be stored into the main memory, before the compiler can start to compile it.

And the question is how. Maybe the most straightforward way to do this is to represent the program being compiled as a string of character (the program text as the programmer writes down), but this approach has the obvious drawback that later phases of the compiler would become more complicated because it's non-trivial to recognize program phrase from the string (think how to find out whether or a variable x has been declared and where if if does. Obviously, this approach may involve complex string operations).

A much more convenient way to store, in memory, the program being compiled is to use some specialized data structures from the implementation language. Nevertheless to say, this approach is highly implementation language-dependent. For instance, the representation technique from using Java would be different from that using C or other implementation languages.

Abstract Syntax Trees as Class Hierarchy

We are implementing a compiler for MiniJava using the implementation language Java, so we use a so-called local class hierarchy technique from Java. Essentially, to represent a program written in MiniJava, we define a set of classes. These classes are grouped in this way: for any production rule

  N -> a1 | a2 | ... | an

there will be an abstract base class CN for that nonterminal N. And for each production rule of that nonterminal, there will be a sub-class inheriting from that abstract base class. For instance, in the example above, there will be totally n sub-classes for a1 to an.

As those classes will form a tree-like layout in memory, so those classes are called an abstract syntax tree data structure. We've offered you the classes for defining MiniJava abstract syntax trees, and you should familiarize yourself with these classes.

Exercise 1. Read the code in the package ast and all the subpackage prefixed with ast such as ast.exp or ast.stm etc.. Refer to the MiniJava reference manual to understand how these classes correspond to the MiniJava syntax. (For now, just ignore the method accept in every class.)

You may also read chapter 1 and 4 of the Tiger book to learn more of local class hierarchy design.

Program Representation as Trees

Given the data structure definitions for the MiniJava syntax, one can (in principle) manually represent any MiniJava program in memory, though this method is too boring and error-prone to do in everyday programming practice, it does clearly illustrate what's going on under the hood. And also this is the dominant programming styles in some functional languages, like Lisp or Scheme.

In the source ast/Fac.java, we've offered you the code on the representation of the sample MiniJava program test/Fac.java in memory.

Exercise 2. Read the code in the file src/ast/Fac.java, and make sure you understand how the program Fac.java is represented in memory. Then write some code to represent the MiniJava program test/Sum.java.

Pretty Printing and Visitor Pattern

The manual (or even automatic) approach of program representation is complex enough, and in order to gain confidence that the representation is correct respect to the original program text, we would like to double check. The standard way to do this is called pretty printing, which is a classical algorithmic research topic.

Again, the implementation of pretty printing is highly implementation language-dependent. For the Java language, the standard way to accomplish this task is to use the so-called visitor pattern. Chapter 4 of the Tiger book gives an excellent introduction to the visitor pattern, you may also refer to some books on design pattern to understand how visitor pattern works.

Exercise 3. Read again the code in the src.ast package, especially these code:

src.ast.Visitor.java
src.ast.Accceptable.java
src.ast.PrettyPrintVisitor.java

Make sure you understand the relationship between these three files.

Compile and then run the Tiger compiler:

  $ java -cp bin Tiger -testFac

which will pretty print the program ast/Fac.java, However, as you may notice from the output, the result for this pretty printing is not correct, there are three lines of code that are different from the original program text:

  System.out.println (new Fac().ComputeFac(10, ));
  public int ComputeFac(int num, )
  num_aux = num * this.ComputeFac(num - 1, );

These are actually bugs, detect where these bugs are triggered and fix these bugs by modifying code. Don't forget to do regression tests.

Exercise 4. The current implementation of the pretty printer in the file src.ast.PrettyPrintVisitor.java are incomplete, i.e., some methods come with empty bodies. Complete those methods by filling in the missing code.

Challenge! Write another pretty printer src.ast.PrettyPrintCSharpVisitor.java, to print out C# code. Does this fact reveal that one can use the same abstract syntax tree to represent both Java and C#?

Challenge! The pretty printer in the file src.ast.PrettyPrintVisitor.java, though enough for processing MiniJava code, may be not that general and powerful. Study some general pretty printing algorithms and implement them into your Tiger compilers. For instance, you may start with Wadler's printer.

Tree Generation

The parser can construct the abstract syntax tree automaticall by it's semantic action attached with the parsing methods.

Exercise 5. Modfity the parser src.parser.Parser.java to add semantic actions to each parsing method. There is hooking code in src.Main.java which will call the new parser and to pretty print the parsing result (abstract syntax tree).

To this point, your Tiger compiler can parse all legal MiniJava programs, generate abstract syntax trees and performs pretty printing on the trees. Don't forget to test your compiler with:

  $ java -cp bin Tiger <inputFile> -dump ast

Fix any bugs before you continue.

Challenge! If you've done the LALR parser from the previous lab, then generate the abstract syntax trees from that parser by adding semantic actions.

Part B: Elaborator

Before continuing to do other operations on the abstract syntax trees, one must check these trees to ensure they are well-formed. By the terminology well-formed, we mean that the input MiniJava program must obey all constraint rules specified by the MiniJava language specification (in turn by the Java specification). Typical rules include: a variable must be declared first before its use; the "+" operators must apply to two operands of integer type; the methods being invoked on an object must have been defined in its class or superclass; and so on. All such checking are preformed by an elaborator in the Tiger compiler. Note that in the literature, there are other names for the elaborator, say type checker or semantic analyzer, but we will call it an elaborator here in this lab.

Symbol Tables

Just like the pretty printing, the process of elaboration involves some kind of (post-order) tree traversal. However, there is a key difference: the elaboration traversal is context-sensitive, thus one have to record necessary context information. One can use some tables to record these. For historical reason, these tables are often called symbol tables, for it associate symbols with their associated informations (such as types, scopes, etc..).

The design of the symbol table is highly-dependent on the language being compiled. As for MiniJava, there are instance variables and methods in each class, so one need one big symbol table, which maps class name to its containing variables and methods; we call this table the class table. And each method contains parameters and locals variables, so one need a symbol table for each method which maps parameters or locals to their corresponding types (among other informations); we call this table the method table.

Exercise 6. Read the code in src.elaborator.ClassTable.java for the data structure defining the class table. And read the code in src.elaborator.MethodTable.java for the data structure defining the method table. Draw figures about the organization of these tables.

It's often the case that one need to dump the content the class table or method table after they have been created, to make sure that they contain everything one need. Now finish the methods dump() in both the two table classes for this purpose. Make sure that your Tiger compiler compiles with:

  $ java -cp bin Tiger -elab classTable -elab methodTable <inputFile>

Elaboration Rules

The elaboration (type checking) is a post-order traversal of the abstract syntax tree. For sub-trees, the elaborator returns the type of that sub-tree; and for root-node, the elaborator compares the types returned from the sub-trees and guarantee the sub-tree types can be combined in some proper way. For instance, for the tree representing an addition node e1+e2, the elaborator first traverse e1 to obtain a type t1, and then e2 to obtain a type t2; at the root node, the elaborator will check that both t1 and t2 are int. Again, these checking rules are language-dependent, and for the Java language, these rules are well specified in the language specification.

Exercise 7. Finish the code in the elaborator src.elaborator.ElaboratorVisitor.java by filling the methods which have empty bodies. You may want to refer to the Java specification when necessary. But pay special attention to the minor differences between Java and MiniJava, say overloading is not allowed in MiniJava.

To this point, your Tiger compiler can parse and elaborate all possible MiniJava programs. Don't forget to test your compiler and fix any bugs before you continue.

Error Handling Recovery

It's the elabortor's job to report all possible semantic errors to the programmer and to recover from the current error to check as many errors as possible. The error handling code in current implementation is very rudimentary in that it only reports some information and then aborts.

Exercise 8. Modify the error reporting method error() in src.elaborator.ElaboratorVisitor.java to enable it to generate more informative error message about the error. And also modify it to generate all errors in the compiled programs, not just the first one.

Declaration Site and Use Sites

Like many other language, Java obeys a declaration-before-use rule, which simply means that for each variable in the program, there should be one and only one declaration site for that variable. However, it's not true for the vice versa: for one variable declaration, there may be zero or more number of uses of that variable; if the number of the uses is zero, then there is the chance of locating an unused variable. A good compiler may generate some error message like this one:

  Warning: variable "x" declared at line 10 never used

just as what your Eclipse has always been generating for your Java code.

Exercise 9. Modify your Tiger compiler to emit warning messages like the one above.

Handin

This completes the lab. Remember to hand in your solution to the information system.