Lab 5: OptimizationsLab OverviewIn this lab, you'll add optimizations into yourTiger compiler, to make it
generate
better target code. There are 3 parts in this lab: in part A, you
will write a bunch of optimizations
on the abstract syntax tree: dead class eliminations,
constant folding, algebraic simplifications, and
dead code elimination, and so on. In part B, you will
write a new intermediate representation: control flow
graph (CFG), and you will write translator from C to CFG, and
you will port your code generator and garbage collector to CFG. In
part C of the lab, you will write several classic data-flow analysis
on CFG: liveness analysis, reaching definitions, available expressions,
and so on; and you will write several optimizations: constant propagation,
copy propagation, dead code elimination, and common sub-expression
elimination, etc..
Getting StartedFirst check out the source we offered you for Lab5:$ git commit -am 'my solution to lab4' $ git checkout -b Lab5 origin/Lab5these commands will first commit your changes to the Lab4 branch of your local Git repository, and then create a local Lab5 branch and check out the remote Lab5 branch into the new local Lab5 branch. Again, you will need to merge your code from Lab4 into the new Lab5 branch: $ git merge Lab4Don't forget to resolve any conflicts before committing to the local Lab5 branch: $ git commit -am 'lab5 init' You should first import the new Lab5 code into your Eclipse editor, and make sure the code compiles. There are a bunch of new files that you should browse through: ast.optimizations.* various optimization on the AST cfg.* control-flow graph on optimizations on it Hand-in ProcedureWhen you finished your lab, zip you code and submit to the school's information system.Part A: Optimizations on the ASTOptimizations are program rewritings with the goal to make the optimized code faster, smaller, and so on. Optimizations can take different flavours: from imperative style to functional style and others. The optimizations you'll design and implement in this lab will make use of a functional style, in which a new program will be built (as the result of the optimization) without modifying the old program (the program being optimized). It'll be easier to write, debug, and maintain your code to use this style. Don't worry about what a functional style is, if you have not heard of it before. After this lab, you will get a deep understanding of the functional programming style and how to program in it (using OO).Optimizations are highly dependent on the specific intermediate representation (IR) used: an optimization may be easy to perform on some kinds of IRs but difficult, if not impossible, on other kinds of IRs. Strictly speaking, as a high-level intermediate representation, abstract syntax trees are not a good IR for the purpose of optimizations, because the control/data flow information are obscure on an AST. As a result, only a few optimization are eligible for ASTs. However, it's not a bad idea for you to write optimizations on the AST as a starting point: first, it will give you some general ideas about how the functional-style optimization strategy work throughout this lab; second, the optimizations you will write on the ASTs, on some cases, will simplify the AST dramatically and thus make later phases of the compiler simpler. Compiler Passes, Timing and TracingYour Tiger compiler has become much complex for now. And because you will add many optimizations in this lab, so it's very important to control the behaviors of your Tiger compiler and all these optimizations. For instance, optimization order are very important because some optimizations are effective only under a specific order. And furthermore, it's also very important to obtain and analyze the execution time of each optimization, so that you can control the tradeoff of optimizations (for example, you may want to drop or skip some optimizations, if it spends too much time).For this, we have offered you, in lab 5 code, more command line arguments to control your Tiger compiler more handily. You can find the new arguments by running: $ java Tiger -especially, you should pay attention to the -skip ,
-trace and -verbose arguments. The
first one will skip some optimizations, and the second one
will trace the execution of some optimizations, and finally the
last one will show the compilation passes in detail.
As an example, read the code in the file
Exercise 1. Be sure you can answer all the following questions:
Dead Class EliminationAny nontrivial Java project may contain many Java classes, however, not all classes are used in that project. So the primary goal of a dead class elimination optimization is to analyze which class will never be in the project and thus can be considered dead. By eliminating all the dead classes from the project, you can shrink the program size considerably. For instance, consider the test casetest/DeadClass.java , the
class Fac is a dead class and thus can be safely
eliminated.
Exercise 2. Design an algorithm to perform dead class elimination. Analyze the computation complexity of your algorithm. Be careful of pitfalls, don't eliminate live classes.
You can use a work-list
algorithm to perform dead
classes elimination in a given Java program. First, introduce a work-list
to hold all live classes. You can read the code in the package
Exercise 3.
Read the code in the package -trace argument, as in:
$ java Tiger -testFac -trace ast.DeadClass Dead Code EliminationBesides dead classes, there may be other forms of dead code in a given Java program. For instance, consider the test casetest/DeadCode.java , there is some dead code
in it (which?). So the purpose of a dead code elimination
is to remove code that can be statically proved to be dead.
Exercise 4.
Design an algorithm to perform dead class elimination. Analyze
the computation complexity of your algorithm. And then implement
your algorithm in the file Algebraic SimplificationThe purpose of an algebraic simplification is to simplify program phrases, according to some algebraic rules. For instance, in the test casetest/AlgSimp.java , you can find
one such instance: the code x*0 can be simplified to
0 .
Exercise 5.
Design an algorithm to perform algebraic simplifications. Analyze
the computation complexity of your algorithm. And then implement
your algorithm in the file Constant FoldingThe basic idea of constant folding is to compute the values of an expression statically at compile time. Consider the test casetest/AlgSimp.java , after algebraic
simplification, the if condition expression becomes
0<2 , obviously, it can be further simplified
to true .
Exercise 6.
Design an algorithm to perform constant folding. Analyze
the computation complexity of your algorithm. And then implement
your algorithm in the file Exercise 7.
The code base in the file Part B: Control-flow Graph (CFG)A control-flow graph (CFG) is a graph-based program representation with basic blocks as nodes and control transfers between blocks as edges. A control-flow graph is a good candidate intermediate representation for it makes optimizations easier to perform. In this part of the lab, you will construct the control-flow graph representation of a given program, and write various optimizations on CFGs.Control-flow Graph (CFG)You will first familiarize yourself with syntax trees for the control-flow graph. We have offered you most of the code for the control-flow graph in the packagecfg.* . And you should read all the code
first. Basically, the data structures for this CFG is very much like
the C AST from lab 3: virtual method tables, structures (classes), main
method, other (virtual) methods, and so on.
Exercise 8. Make sure you read the following code:
codegen.C.* .
cfg.method.Method.java ).
Basic Blocks (BBs) So what are basic blocks? A basic block is a sequence of code that can only enter from the first instruction and exit from the last instruction. So, intuitively, you can only enter a basic block from the first instruction and executes each instruction contained therein in turn, until reach the last instruction; you can NOT enter or exit from the middle of a basic block. Exercise 9.
You can find the data structures defining a basic block in the
code
The statements in a CFG (code
The name three-address code reflects one important property
of this intermediate
representation: in general, most statements contain at most
three operands (addresses)
(although variants of TACs may contain more or less addresses).
For instance, consider the
In the package Exercise 10.
Read all the code in the package
Now, you can compile the test case $ java -cp bin Tiger -testFac -dump Cthis will produce a control-flow graph in a C file a.c , along
with a C code from lab 3 part A in file a.c.c . Note that
in the control-flow graph, there is the following line at the 4th
line:
// Control-flow GraphSkim these files and compare them briefly. Graph VisualizationIt will be nice to visualize a control-flow graph, so it will be easier to analyze the graph. Fortunately, the Tiger compiler comes with this nice feature. Nevertheless to say, to draw a figure prettily is nontrivial and may lead to much programming effort, so your Tiger compiler will make use of the Graphviz software to visualize a control-flow graph. Graphviz is a very popular software for graph visualization (both in compilers and other fields).Exercise 11. Download graphviz from its official download site, and install it on your machine. Don't forget to add it to your PATH. To make sure you've installed graphviz correctly, you can run its command on your prompt: $ dot -?will will print something like: Usage: dot [-Vv?] [-(GNE)name=val] [-(KTlso)Also, you can take a look at its manual, if you are interested in. Exercise 12.
To visualize a CFG for each method, run your Tiger compiler using
the $ java Tiger -testFac -visualize jpgThis command will draw two figures in jpeg format. In the jpeg file Fac_ComputeFac.jpg , you can see four blocks connected by
directed edges. However, in the blocks, there are strings like:
Your code here:you should add some code to substitute this line with real code in this block (you can refer to exercise 10 for the real code). Hint: the code you will modify resides in the file cfg.block.Block.java .
-verbose option to control
your Tiger compiler. And you
will continue to modify this part of the code in later part of
the lab.
CFG GenerationIn principal, you can generate CFG from the Java AST directly, however, you will "reinvent the wheel" by doing so, because many hard work has been finished in part A of lab 3. So a better idea is to write the code generator starting from the C code from part A of lab 3.
You can find this code generator in the visitor
Exercise 13.
Finish this code generator by filling in code for all methods. For this,
you may need to add more classes to package Part C: CFG-based OptimizationsCFG-based program representations make program optimizations much easier, due to the fact that it's much easier to calculate the control flow and data flow information required to perform optimizations. Generally speaking, an optimization can be divided into two phases: program analysis and rewriting. During the program analysis phase (usually data-flow analysis), the compiler analyzes the target program to get useful static information as an approximation of the program's dynamic behaviour. And such information will be used in the second phase: program rewriting, the compiler rewrites the program according to the static information obtained in the first phase. As a result fo these two steps, the program is altered in some way---it is optimized. For instance, in the dead-code elimination optimization, the program analysis phase will determine whether or not the variablex defined
by x = e will be used anywhere in this program; if
x will not be used, the program rewriting phase
will remove this assignment from this program (and the program
shrinks).
In this part of the lab, you will write several classical program analysis: liveness analysis, reaching definitions, available expressions, and so on. And you will also write several classical optimizations (based on these analysis): constant propagation, copy propagation, dead code elimination, etc.. Liveness AnalysisYou will write liveness analysis in this part of the lab. The liveness analysis you'll write calculate, for each statement and each transfer in a block, the live in and live out variable sets.You will write your liveness analysis algorithm (in fact, most data-flow analysis algorithms) on a block granularity, so you should take a look at the general steps in such an algorithm: data_flow() calculate the gen and kill sets for each statement and transfer; calculate the gen and kill sets for each basic block; calculate the in and out sets for each basic block; // fix-point calculate the in and out sets for each statement and transfer.For specific data-flow you will write next, you will need to modify this algorithm template to finish your work. Different data-flow algorithms will take minor differences. For instance, for the liveness analysis in this part of the lab, you should calculate live_in and live_out for the in
and out separately; and the fix-point code should
calculate the in and out sets for each basic block
in a reverse topo-sort order. But for other kind of data-flow
analysis, all these details may be different.
Exercise 14.
Finish the liveness analysis code in the file
$ java -cp bin Tiger -testFac -trace liveness.step1 Dead-code EliminationA statementx = e is dead, if the variable x
does not live out this statement. So, this statement can be safely
removed.
Exercise 15.
Finish the dead-code optimization in the file
Reaching DefinitionsYou will write the reaching definitions analysis in this part of the lab. Intuitively, the reaching definition will analyze which definition can reach which use.Exercise 16.
Finish the reaching definition analysis in the file
Constant PropagationSuppose, for any statements of the form x = y
or x = y op z , where y is defined in
some statement t like y = c where c
is a constant. And
suppose the definition t is the unique definition that reaches
the statement s. Then you can replace the variable y , in
the statement s, with the constant c from the statement t.
Exercise 17.
Finish the constant propagation code in the file
Copy PropagationCopy propagation is like constant propagation. Suppose, for any statements of the form x = y
or x = y op z , where y is defined in
some statement t like y = m with a variable m . And
suppose the definition t is the unique definition that reaches
the statement s. Then you can replace the variable y , in
the statement s, with the variable m from the statement t.
Exercise 18.
Finish the constant propagation code in the file
Available ExpressionsConsider the statementx = y op z , if the expression
y op z has been computed before this statement, then
the expression y op z can reuse the computed result.
Hence, you can compute whether or not the expression y op z
is available here.
Exercise 19.
Finish the available expression analysis in the file
Common Sub-Expression Elimination (CSE)For the statementx = y op z , if the expression
y op z is available. We can substitute the
expression y op z with the most recent definition
using y op z . You can read the Tiger book
section 17.3 for this algorithm.
Exercise 20.
Finish the constant propagation code in the file
-verbose to trace the
execution time of each pass.
Safe Points and Precise GCLiveness analysis has many important applications in compiler optimizations such as register allocations, and so on. But for this part of the lab, you will consider another kind of important application for liveness analysis: precise garbage collection.Recall that, in lab 4, you have written the Gimple garbage collector. To help Gimple find the roots, the Tiger compiler will generate GC stack frames telling Gimple where to locate the roots. One key point to note is that the GC stack frames is a one-to-one correspondence to the call stack frames, that is, there is just one GC stack frame on one call stack frame (and thus one function). Though simple to implement, this approach has serious drawbacks, it can not collect garbage precisely. Consider one challenge problem from the previous lab: A f (){ A x, y; x = new A(); // suppose that Java heap is out of space here... y = new A(); return y; }suppose that after allocating object x , there is no space
in the Java heap to continue to allocate y , so the
Gimple collector should take over. However, the GC stack frame
would contain x so that the object pointed by
x will NOT be collected. But a close look at
the code will reveal that that object can indeed be collected
because x will not live out the first statement.
In order to perform a much precise garbage collection, we can introduce the concept of safe points: positions which are safe to emit GC stack frames. Because the MiniJava language you are compiling is single-threaded, so the safe points are function invocations (including constructor invocations). The basic idea is to emit different GC stack frames before each function invocation, and each GC stack frame only contains the live references at that point (that is: the live in set of that function invocation). To better understand this, consider the code fragment above, you can generate two different GC stack frame for the two method invocations, like this: A f (){ A x, y; // generate the first frame (empty): {} x = new A(); // generate the second frame: {y} y = new A(); return y; }As you can see, the reference x will not appear
in any GC frame at all.
Exercise 21. Port your Gimple garbage collector to control-flow graph to do precise garbage collection. You should calculate the live in set for each method invocation and generate GC frame based on that. Exercise 22.
In the above code, the variable HandinThis completes the lab. Remember to hand in your solution to the information system. |