Programming Assignment #3: Virtual Machines and Compilers

Discrete Mathematics and Its Applications

Programming Assignment #3: Virtual Machines and Compilers

A virtual machine is a computer which is built with softwares, instead of chips and silicons. In this problem, our job is to design and implement a vitural machine called "CVM" for the miniC programming language (as defined in programming assignment #2). The CVM could run programs written in the miniC programming language.
The organization of CVM is simply a tuple:
```
CVM -> (store, prog)
```
where prog is defined in programming assignment #2. And the store is a mapping
```
store : id -> num
```
which maps identifiers id to integers num. Two useful operations could be defined on a store st:
```
st (id)
st [id |-> n]
```
the first one looks up and returns the integer value of id in the store st. And the second one updates the value of indentifier id with integer n in store st, and if id does not belong to the domain of st, this operation simply adds the binding from id to n into st.
With all the above notations, we can define a bunch of relations to .
The first relation (st, e) -> v is given by a function E(st, e) = v which classifies the execution of expressions. The execution of an expression e on some store st always returns e's value v (an integer). The detailed rules are presented below, and are inductively defined according to the structure of e.
```
E(st, num)    = num
E(st, id)     = st (id)
E(st, e1+e2)  = E(st, e1) + E(st, e2)
E(st, e1-e2)  = E(st, e1) - E(st, e2)
E(st, e1*e2)  = E(st, e1) * E(st, e2)
```
The second relation (st, s) -> st' is given by a function S(st, s) = st' which classifies the execution of statements. The execution of a statement s on some store st always returns a new store st'. The detailed rules are presented below, and are inductively defined according to the structure of s.
```
S(st, id=e)     = st [id |-> E(st, e)]
S(st, print(e)) = st        // with side effect of printing E(st, e) on the screen terminated with a new line
```
The third and last relation (st, prog) -> st' is given by a function P(st, prog) = st' which classifies the execution of a whole program. The execution of a program prog on some store st always returns a new store st'. The detailed rules are presented below, and are inductively defined according to the structure of prog.
```
P(st, stm prog) = P(S(st, stm), prog)
P(st, stm)      = S(st, stm)
```
Just as the above problem, we may also design and implement a virtual machine called "PVM" for the miniPentium language (also in assignment #2).
The PVM is also a tuple:
```
PVM -> (store, prog)
```
where prog is a miniPentium program and the definition of store is the same as the above problem.
Also, the execution of PVM is specified by a collection of relations. The first relation (st, v) -> n is given by function V(st, v) = n which formalizes the execution of operands v on some store st.
```
V(st, num)  -> num
V(st, id)   -> st (id)
```
The second relation specifies the execution of an instruction, and is given by a relation (function) I.
```
I(st, movl v, id)  -> st [id |-> V(st, v)]
I(st, addl v, id)  -> st [id |-> V(st, id) + V(st, v)]
I(st, subl v, id)  -> st [id |-> V(st, id) - V(st, v)]
I(st, mull v, id)  -> st [id |-> V(st, id) * V(st, v)]
I(st, print v)     -> st       // with side effect of printing V(st, v) on screen terminated with a new line
```
The third and final relation (function) P specifies the execution of prog.
```
P(st, instruction prog)  -> P(I(st, instruction), prog)
P(st, instruction)       -> I(st, instruction)
```
(Optional, extra credit.) A compiler is a program that takes as input programs written in one (typically high-level) programming langugage, and outputs corresponding equivalent programs written in another (typically low-level) programming language.
In this problem, our job is to design and implement a small compiler "miniVC" which compiles miniC into miniPentium. This task can be divided into several sub-problems:
- Code translator. The code translator translates miniC code into miniPentium code. It's defined via three recursive functions P, S, E.
  The function P is defined on miniC prog. Enssentially, it translates each statement in prog in turn.
```
P(stm prog) = 
     S(stm) 
     P(prog)

P(stm)      =
     S(stm)
```
  The function S is defined on statements stm. It translates each miniC statement into a series of miniPentium instructions.
```
S(id = exp)   =
     E(exp)
     movl result, id

S(print(exp)) =
     E(exp)
     print result
```
  The function E is defined on expressions exp. This function has two effects: first, it translates each miniC expression into a series of miniPentium instructions, and meanwhile it puts the value of this expression into a special variable result.
```
E(num)     =
     movl num, result

E(id)      =
     movl id, result

E(e1 + e2) = 
     E(e1)
     movl result, t1
     E(e2)
     movl result, t2
     addl t2, t1
     movl t1, result
     
E(e1 - e2) =
     E(e1)
     movl result, t1
     E(e2)
     movl result, t2
     subl t2, t1
     movl t1, result

E(e1 * e2) =
     E(e1)
     movl result, t1
     E(e2)
     movl result, t2
     mull t2, t1
     movl t1, result
```
  Hint: you may choose to implement the result variable as an external variable or as a return value of function E.
- Type checker. Not all miniC or miniPentium programs are valid. For instance, the following program is illegal, for that the variable x is uninitialized:
```
y = x;
print (y);
```
  And the execution of this program will either core dumped or print an arbitary junk value. To cure this, we may statically check whether there exist uninitialized variables in a program, and if so, we'd like to issue a friendly warning or error message to the programmer, instead of going into crazy core dumped at runtime. This process is called (a form of) type checking.
  Our job is to design and implement a type checker to check uninitialized variables in miniC and miniPentium. We don't give formal checking here, it's your task to think careful what the proper rules will be.
- Optimizer. The task of an optimizer is to convert a source program into an equivalent target one, with less code size or higher performance. For instance, here is an example:
```
x = 8;          opt1    x = 8;         opt2    x = 8;        opt3
y = 9;         ======>  y = 9;        ======>  y = 9;       ======>
z = x + y;              z = 8 + 9;             z = 17;
print (z);              print (z);             print (17);           print (17);
```
  Opt1 is called contant propogation which substitutes variables with their specific values. And opt2 is called constance folding which calculates an expression's value, if this expression is composed of only constants (note that it's followed by a contant propogation again). And opt3 is called dead-code elimination which eliminates instructions that has no effects on program output.
  (1) Your job is to implement all these optimizations on miniC and miniPentium. Is it true that there will always be a print statement left, after all such optimizations applied?
  (2) Are there any other kind of optimizations that could be applied to the miniC or miniPentium? If so, try to implement your ideas.
- Real Pentium. How to run the miniPentium programs on a real Pentium machine? There are at least three approaches to achieve this:
  (1) Variables in static region. In this approach, all variables are put into the static data region on Pentium machine. And some instructions may be revised to meet the Pentium convention. Does this approach support function calls? And does this method scale to the full C languag?
  (2) Variables on stack. All variables in a given program are put into the function local stack. As above, some instructions must be revised.
  (3) Variables in registers. All variables are put into machine registers, and this process is called register allocation.
  
  Which the above approach the GCC compiler and the MS VC6.0 compiler use? And why?
(Optional, extra credit.) A stack machine may have no registers, and all computations are employed on a stack (and hence the name). (The most famous state-of-the-art stack machine may be the Java Virtual Machine---JVM.)
In this problem, our task is to design and implement a stack-based virtual machine "SPVM".
(1) First we would like to revise the definition of the miniPentium language to make it stack-based (call it stackPentium). For this strategy to work, we may revise the existing instructions add some new ones to miniPentium. For instace, we may revise the definition of instruction to this one:
```
instruction -> addl
            -> subl
            -> mull
            -> print
            -> pushl v
            -> popl  id
```
Note that we've add two new instrutions pushl, popl. And note some instructions have been revised heavily. For instance, the addl instruction now take no operand at all. And the original addl v, id instruction could be expanded:
```
addl v, id  ======> pushl v
                    pushl id
                    addl
                    popl  id
```
Other instructions could be revised in a siliar way.
(2) Design a stack-based virtual machine for the stackPentium. For this to work, we would implement a stack in this machine.
(3) Write a small compiler from miniPentium to stackPentium. And write a small compiler from miniC to stackPentium.
(4) What are the win and hurt of the stack machine, comparing with the variable (register) machine in the second problem.