Lab 1: mini-JVM
Due time: to be announced, but we advise you to start as early as possibleIntroduction
This lab requires you to write a virtual machine (an interpreter) for a small stack-based low-level language called mini-bytecode, just like the well-known Java bytecode.
Requirement
You should download this package, in which we have offered you a code skeleton. Here is a brief discription of the files:
call-main.sml
: the entry point of whole programmain.sig, main.sml
: control all parts of the mini-JVMfile-reader.sig, file-reader.sml
: read as input a bytecode file, to form a char listbytecode.sig, bytecode.fun
: the internal representation of a bytecode fileparser.sig, parser.fun
: parse the input char list, and form the internal representation of a bytecode filejvm.sig, jvm.fun
: the mini-JVM execution enginesources.cm, sources.mlb, jvm.mlb
: various makefilestest/*
: some test casesutil/*
: various utility library
First, compile the sources with MLton:
mlton -verbose 2 jvm.mlbwhich would normally produce an executable
jvm.exe
or jvm
(If you don't produce the executable, contact us!). And now you may try to
run a test case and see the result by typing:
jvm -verbose 2 test/gcd.classYou may see the execution information ended with an exeception
unhandled exeception: AddYourCodewhich means that you must add your code at some point (as we'd point out a little later) to make the executable run correctly. (Question: what other options jvm supports besides
-verbose
?)
The high-level execution flow is: first, file-reader
read from a text file and return its content as a list of char; and
then parser
parses the char list into bytecode
, and
finally, the jvm
executes the internal representation of
bytecode and print out the final result to standard out.
Your job is to finish
the files "bytecode.fun", "parser.fun"
and
"jvm.fun"
. The code you should modify looks like:
raise AddYourCodeSo you should find all place marked as this and replace them with your code.
We have also provided you two test cases:
"gcd.class"
and "sum.class"
, as can be
found in the "test/"
sub-directory. The correct output should be "Result:4"
and "Result:55"
separately. After you finish all the files,
you can compile by "CM.make "sources.cm";"
using SML/NJ
or by "mlton jvm.mlb"
in command-line using MLton to test your code.
mini-Bytecode Syntax
The mini-Bytecode language is stack-based, supporting following instructions, in which i is an integer between 0 and 127.
- pop
- pop top element off the stack
- add
- add
- sub
- subtract
- swap
- swap top two elements
- push i
- push value i onto stack
- jump i
- jump to instruction i
- jeq i
- jump to i, if top two equal
- jlt i
- jump to i, if top two less
- store i
- save top element as variable i and pop off
- load i
- load variable i onto stack
- stop
- stop execution
Program is a sequence of these instructions. For example, here is a program to find the greatest common divider of 12 and 8.
0 : PUSH 12 1 : PUSH 8 2 : JEQ 8 3 : JLT 6 4 : SUB 5 : JUMP 2 6 : SWAP 7 : JUMP 2 8 : STOP
Input file format
Java bytecode program are encoded in binary form, which are relatively hard and tedious to decode. Hence, we have made things much simpler by encoding the mini-Bytecode in Ascii form, rather than in binary form.
First, integers will be encoded as a character. Here ord maps a character to a number according to ASCII chart. And chr is the opposite of ord.
encInteger i = chr(ord "0" + i)
So, 1,2,3 will be encoded as "1","2","3" and 41,42,43 as "Y","Z","[".
The encoding of all the instructions is defined as function enc. Here ^ means to concatenate two strings. "a" ^ "b" = "ab".
enc(pop) | "p" |
enc(add) | "+" |
enc(sub) | "-" |
enc(swap) | "s" |
enc(push i) | "c" ^ encInteger(i) |
enc(jump i) | "j" ^ encInteger(i) |
enc(jeq i) | "=" ^ encInteger(i) |
enc(jlt i) | "<" ^ encInteger(i) |
enc(load i) | "r" ^ encInteger(i) |
enc(store i) | "w" ^ encInteger(i) |
enc(stop) | "." |
For example, the program showed in previous section is encoded as "c<c8=8<6-j2sj2.".
How mini-JVM runs?
A mini-Bytecode program is a list of instructions and
we mark them from 0 as showed in section 2.
The machine state during the execution is (stack, store, pc)
. Here
stack is the data stack,
store is a memory containing all variables, pc
is the program counter (i.e., the next instruction in the instructions
list to be executed). For example, ([1,2], [0:12,1:8], 3)
means we have a stack of [1,2] , a store mapping variable 0 to 12
and variable 1 to 8, and we are going to execute the 3rd instruction.
Suppose that before an instruction instr
executes, the
machine state is ([x, y, ...], store, pc)
(remember that
now pc
points to instr
),
we describe the effect of executing the instruction instr
in the table below.
Instruction | Machine state after execution |
---|---|
pop | ([y,…],var,pc+1) |
add | ([(x+y),y,…],var,pc+1) |
sub | ([(x-y),y,…],var,pc+1) |
swap | ([y,x,…],var,pc+1) |
push i | ([i,x,y,…],var,pc+1) |
jump i | ([x,y,…],var,i) |
jeq i | if x=y then ([x,y,..],var,i) else ([x,y,…],var,pc+1) |
jlt i | if x<y then ([x,y,..],var,i) else ([x,y,…],var,pc+1) |
load i | ([var(i),x,y,…],var,pc+1) |
store i | ([y,…],update var(i) to be x,pc+1) |
stop | ([x,y,..],var,pc) |
The initial mathine state is ([], [], 0)
, which
means an empty stack and store, and we run from the 0th instruction.
Program stops when we reach stop
, and the top element
on stack is treated as the final result.
For example, the program in section 2 will execute as below. And the result is 4.
PUSH 12 | ([12 ],[ ],1) |
PUSH 8 | ([8,12],[ ],2) |
JEQ 8 | ([8,12],[ ],3) |
JLT 6 | ([8,12],[ ],6) |
SWAP | ([12,8],[ ],7) |
JUMP 2 | ([12,8],[ ],2) |
JEQ 8 | ([12,8],[ ],3) |
JLT 6 | ([12,8],[ ],4) |
SUB | ([4,8],[ ],5) |
JUMP 2 | ([4,8],[ ],2) |
JEQ 8 | ([4,8],[ ],3) |
JLT 6 | ([4,8],[ ],6) |
SWAP | ([8,4],[ ],7) |
JUMP 2 | ([8,4],[ ],2) |
JEQ 8 | ([8,4],[ ],3) |
JLT 6 | ([8,4],[ ],4) |
SUB | ([4,4],[ ],5) |
JUMP 2 | ([4,4],[ ],2) |
JEQ 8 | ([4,4],[ ],8) |
STOP | ([4,4],[ ],8) |
Handin
"Result:***"
.
The last thing to note is that you should not change the file name
of jvm.mlb
(but you can modify its content).