Lab 1: Lexer and ParserLab OverviewIn these series of labs in this course, your job is to design and implement a compiler from scratch, called Tiger, for the MiniJava programming language. MiniJava is a non-trivial subset of the Java programming language, which is described in the appendix of the Tiger book. Meanwhile, you will use Java as the implementation programming language (so we are following the famous chicken and egg tradition in compiler design history). At the end of this course, you'll learn in depth how to implement a compiler for a modern OO language like Java, using Java.There are six labs and a final project planned:
This is the first lab. This lab consists of three parts: the Straight-Line Programming Language (SLP) and its interpreter and compiler; the Tiger lexer, and the Tiger parser. Software SetupYou should finish all labs on the Linux OS, for the Tiger compiler will make use of other tool chains on Linux. In this lab, the implementaion language you will use is Java, check here for the installation issues. Important note: make sure you've installed the latest Java compiler and Java runtime (version 8.0 or above, version 7.0 or below do NOT compile the Tiger code!). Eclipse is a very popular Java IDE. You can also use other IDEs, such as IDEA, netbeans, or others. Version control and GitIn each lab, we'll supply skeleton code for you to start with and modify. These files are controlled and distributed using the Git version control system. So, to access these files, you will need a Git client installed on your machine. On Linux, it's easy to install a Git client:# apt-get install git For those who have no experience in using Git before, don't forget to refer to the official manual when in doubt. Getting StartedWe've decided to put the Tiger code repository on GitHub (this courseware is open-source). Now, you can start by checking out the code repository for lab 1 to your machine:$ git clone https://github.com/bjhua/tiger.gitwhich will create a new folder tiger in
which resides all the source files just checked out for Lab 1.
Git is a distributed version control system, for it allows one to work with the local version of repository on her own machine. For instance, if you have finished exercise 5 of Lab 1, you can commit the changes by: $ git commit -am 'my solution to lab1 exercise 5'which will commit the changes to your local repository. Now import the project into your Eclipse and browse the source code: src/Tiger.java: the Tiger "Main" class src/lexer/*: the Tiger lexer src/parser/*: the Tiger parser src/control/*: options to control the behavior of the Tiger compiler src/util/*: utility classes src/slp/*: the SLP interpreter and compiler test/*: sample MiniJava programs as test casesNow, build the Tiger compiler on your machine (in Eclipse). As a result of the building process, a new bin folder
would be created. Now run the Tiger compiler on the
prompt:
$ java -cp bin Tiger -helpwhich will output something like: The Tiger compiler. Copyright (C) 2013-, SSE of USTC. Usage: java Tiger [options] <filename> Available options: -help show this help information -lex dump the result of lexical analysis -slp {args|interp|compile} run the SLP interpreter -testlexer whether or not to test the lexer Lab RequirementThere are two kinds of exercises: normal exercises and challenge ones. Challenge exercises may not be that hard, but may involve substantial code hacking. You are required to do ALL normal exercises. All challenge exercises are optional.Hand-in ProcedureWhen you finished your lab, zip you solutions and submit to the school's information system.Part A: SLP Interpreter and CompilerIn this part of the lab, you will do some programming exercises to warm you up. To be specific, you will write an interpreter and a compiler for a small programming language SLP, which stands for Straight-Line Programming language. The syntax for SLP is given in chapter 1 of the Tiger book. As its name illustrates, SLP is very simple in that it contains no control structures as found in most languages.This part of the lab serves as an introduction to environments (symbol tables mapping variable names to information about the variables); to abstract syntax (data structures representing the phrase structure of programs); to recursion over tree data structures, useful in many parts of a compiler; and to a functional style of programming without assignment statements. SLP Abstract SyntaxSLP contain just two syntactic forms: statements and expressions. You can read its syntax definition in chapter 1 of the Tiger book.Exercise 1. Read the SLP syntax in the chapter 1 of the Tiger book, make sure you understand how an SLP program is formed. As a specific example, explain to yourself why this program is legal according to SLP syntax: a := 5+3; b := (print(a, a-1), 10*a); print(b)What's this program's output? The central idea of a compiler is to encode the language syntax with some data structures. We will delve into this topic in lab 2. For the purpose of this lab, we have offered you some Java classes which encode the SLP syntax. Exercise 2.
Read the class definitions in a := 5+3; b := (print(a, a-1), 10*a); print(b) Maximum Argument NumberA program may contain zero, one or moreprint
statements, each taking one or more arguments. For instance, the
above SLP program has two print statements: the
former one has two arguments and the latter one has one argument.
The maximum argument number of a program is the maximum number of
arguments from all print statements.
Exercise 3.
Write a Java function $ java -cp bin Tiger -slp argswhich should output: $ 2your code is buggy, if your code output something other than this. Fix any bug before continuing. InterpreterGenerally speaking, there are two ways to run a program: interpretation and compilation. In this part, you will first write an interpreter for SLP, and in the next part, you will write a compiler so that you can gain deeper understanding of both the two ways.
Basically, an interpreter runs a given program
online, that is, it analyzes the program and mimic the
behaviour of the program during the analysis. As SLP
has assignment statement like
In an imperative memory, when a variable Exercise 4.
Write a Java function $ java -cp bin Tiger -slp interpwhich should output: $ 8 7 80note that the number 7 is followed by a
newline character. Fix any bug before continuing.
A Compiler from SLP to x86We have also offered you a small compiler which compiles an SLP program to x86 assembly. We have offered you all the code, you can run the compiler:$ java -cp bin Tiger -slp compilewhich should generate an executable a.out . Of
course, you can see the generated x86 assembly via
an undocumented option keepasm (uhhh, this
is a backdoor? and you
can think what can happen, if you don't have the source
code of a compiler):
$ java -cp bin Tiger -slp compile -slp keepasmwhich should output an assembly file. Exercise 5.
Read the code to make sure you understand how an SLP
program is compiled to x86 assembly.
Now, you can try to compile another SLP program:
$ java -cp bin Tiger -slp compile -slp divWhat's the output? Modify the compiler's source code to exit more gracefully for "divide by zero". Part B: The LexerLexical analysis is very first phase of a compiler, which reads as input the program source files and outputs a series of tokens recognized. And the first step in designing and implementing a lexer is to design good data structures to represent the input and output, nevertheless to say, this task is implementation language dependent. In Java, the input can be represented as some kind of (buffered) input stream established from the program source text file, and the output token needs to be represented as another data structure.
The data structure defined for token can be found in
the class
Exercise 6.
There are a bunch of token types defined ( Generally, there are two approaches to implement a lexer: the hand-written approach and the automatic lexer generator approach. Both of them are very popular in complier implementations. In this lab, we will use a hand-written approach in implementing the lexer for Tiger, and this approach are also used by compilers such as GCC or LLVM.
The method Exercise 7.
Read the MiniJava specification and study carefully
the forming rules for each kind of token. Then supply
the missing code for the method
To this point, you've finished the lexer, and your Tiger
compiler should compile ALL the test programs offered in
the
As a driver, there are code
in
Before you continue, you should run your compiler on
these test cases and make sure your lexer works properly,
(for instance, to test your compiler on the input file
$ java -cp bin Tiger ./test/Fac -testlexerIf your compiler fails on any test cases, check your code, fix the bugs and re-test your compiler. Exercise 8. Your lexer must be fast enough to lex (reasonably) large Java programs. To test how fast your lexer is, you should use some relatively large Java test cases. Nevertheless to say, it's boring and error-prone to write such test cases by hand. So we can apply some automatic approach to generate large test cases. Download this monster generator and compile it: $ javac MonsterGen.javathen run it to produce a Monster.java code:
$ java MonsterGen 100000>Monster.javaAnd run your lexer on Monster.java to see the
running time. What's the largest monster your lexer can
deal with? Can Sun's javac compiler deal
with it?
Challenge!
For now, we use an integer type to
represent the Challenge! Modify your lexer to lex full Java, instead of MiniJava. After you finish this, you can use your lexer to lex some nontrivial Java projects, such as the Apache's source code. Challenge! Use a lexer generator to build your Tiger lexer. For instance, you may try the JFlex. Which kind of lexer is faster, the hand-written one or the automatically generated one? Part C: The ParserThe task of a parser is to parse the input program, according to the given language syntax (production rules), and to answer whether or not the program beging parsed is legal (or meanwhile to generate some internal data structure as a result). In history, the parser may also generate target code directly, but recent compilers seldomly do this way, for the increasing complexivity of the languages and the increasing power of a modern computer.The parser we'll implement in this part of the lab will analyze the program being compiled and check whether or not the program is valid according to the MiniJava syntax. If so, the parser just succeeds and does nothing; if not, the compiler will output some error message indicating what the syntactic error is and where. (But in the next lab, we'll expand this parser by attaching semantic actions to let the parser generate an abstract syntax tree.) Just as the lexer, there are generally two approaches to implement a parser: hand-written or using an automatic parser generator. The parser we'll implement in this part for the Tiger compiler is a hand-written recursive descendent one. That is, all production rules become a bunch of recursive functions with (approximately) one function for one nonterminal. Exercise 9.
Finish the recursive descedent parser in the
file /test and your
own test case. Fix possible bugs.
Exercise 10.
The current implementation of the error handling and
error recovery method Exercise 11.
Recompile the above Challenge! Use some parser generators to re-implement the parser. For instance, you may use CUP, or AntLR, or SableCC, etc.. Again, this exercise may be not that difficult, but may require considerable modification to current code base and a lot of programming effort. HandinThis completes the lab. Remember to hand in your solution to the information system. |