Assignment 5: Theory of EUF


Overview

In the class, we discussed the theory of equalities and uninterpreted functions, essentially, the only important property of uninterpreted functions is the congruence property, which states that two functions are equal if and only if two functions are the same, and their corresponding arguments are also equal. In this assignment, you'll be familiarizing yourself with the basic implementation of equality and uninterpreted functions in Z3; and you'll also build a translation validator for a very simpler optimizing compiler using the theory of equality and uninterpreted functions. In the following of this assignment, we'll name theory of equality and uninterpreted functions as EUF.

This assignment is divided into two parts, each of which contains both some tutorials and problems. The first part is some basic SMT problems for EUF; and the second part is about building a translation validator for a simple optimizing compiler. Some problems are tagged with Exercise, which you should solve. And several problems are tagged with Challenge, which are optional.

Before starting with this assignment, make sure you've finished Software Setup in the assignment 1, and have Z3 and Python properly installed on your computer. For any problems, please feel free to contact us for help.

Hand-in Procedure

When you finished the assignment, zip you code files with file name studentid-assignment5.zip (e.g SA19225111-assignment5.zip), and submit it to Postgraduate Information Platform. Any late submission will NOT be accepted.


Part A: Basic EUF theory

In this section, let's start to learn how to solve basic SMT problems in EUF theory with Z3. To declare an uninterpreted function and term, you can declare a new sort, and terms and functions on that sort:

    S = DeclareSort('S')
    e = Const('e', S)
    f = Function('f', S, S)
    g = Function('g', S, S, S)

the above code declared a new abstract sort S, and a term e on S, and a function f with signature S->S, that is, function f accepts argument of type S, and returns value of type S. And function g is very much like the function f, except for function g accepts two arguments, both of them are type S.

Given the above declarations, we can ask Z3 to solve EUF constraints, as in:

    solve(e != e)
    solve(e == f(e))
    solve(e == f(g(e, e))
Exercise 1: Download file exercise1.py and run it. Explain the output to yourself. Add any other constraints, and explain the output.

The most important rule to connect terms is the congruence rule:

            s1=t1      ...      sn=tn
    ------------------------------------------- (Congruence)
         f(s1, ..., sn) = f(t1, ..., tn)

Theory of EUF if often used to prove the equivalence of different implementation of the same algorithms. In the class, we discuss such an example to calculate the integer powers. In the following exercise, you'll be implementing the equivalence of the two algorithm.

Exercise 2: Download and finish the code in the equiv.py file, to prove the equivalence of the two algorithms.

Part B: Translation validation

In the class, we discussed about translation validation, in this section, you'll implement a translation validator for an optimizing compiler for a small language.

A compiler is a software translate a program in a source language into the equivalent program in some target language. An optimizing compiler will optimize the program during the translation, in order to improve the code, for instance, to improve execution efficiency or to reduce code size, etc..

Nevertheless to say, a compiler is one of the most important pieces of the software stack, which we must guarantee the high confidence of its correctness. For this purpose, we can engineer a verified compiler, like the CompCert, in which every phase of a compiler is proved to be correct. Though this approach offers a very high confidence of correctness, the engineering effort is very high, due to the fact that all the correctness proofs must be carried out manually.

In contrast, a translation validation approach does not try to the correctness of the translation, but to validate the output from each compiler run is equivalent with the input program. This can be summarized in the following figure:

    input(source language) ---> compiler translation ---> output(target language)
      |                                                       |
      ------------------------> translation validator <--------

It's not hard to see, this is weaker than the fully verified approach, because we don't prove that the compiler is free of bugs, but just verify the output of each compiler run. Though this is a lightweight approach, the key advantage of this approach is this can be fully automated.

The source language we want to compile is called Calc, which is a shorthand for Calculator language. The context free grammar for this language is given by:

    bop ::= + | - | * | /
    E   ::= x | E bop E
    S   ::= x=E
    F   ::= f(x1, …, xn){S;* return E;}

the non-terminal bop stands for an binary operator, we have a bunch of syntactic forms: the addition, subtraction, multiplication, and division.

the non-terminal E stands for an expression, we have a bunch of syntactic forms: a basic variable x, binary operation of two expressions.

The non-terminal S stands for a statement, there is only one syntactic form: the assigment.

A function F has a sequence of arguments: x1, ..., xn, and its function body consists of a sequence of statement S, followed by a return statement, and the return statement always returns a variable x.

The first step is to encode the syntax in Python, this is included in the calc.py file.

Exercise 3: Download and read through the calc.py and counter.py file, and make sure you understand the data structure Expr, Stmt and Function. You don't need to write any code.

As we discussed in the class, in order to reason about programs in this language, we should convert programs in this language into static single assignment form, or SSA form. SSA conversion, generally speaking, is a little tricky to do, but for the purpose of this language Calc, the conversion is not very complicated, as Calc does not contain control-flow constructs. The key algorithm is to generate a new variable name for each variable that gets assigned to, and rewrite expressions to refer to the newly created variable names.

Exercise 4: Continue on the calc.py file, and make sure you understand the functions to_ssa_func(), to_ssa_stmt() and to_ssa_expr(). You don't need to write any code.

Once we have the SSA form, we can get Z3 constraints on that form.

Exercise 5: Continue on the calc.py file, and make sure you understand the functions gen_cons_func(), gen_cons_stmt() and gen_cons_expr(). You don't need to write any code.

Now let's continue to discuss the target language. The language we defined is called Tac, which is a shorthand for three address code, this is one of the popular compiler intermediate languages. The context free grammar for Tac is presented below:

    S ::= x=y | x=y+z | x=y-z | x=y*z | x=y/z
    F ::= f(x1, ..., xn){S;* return x;}

there are several forms for a statement S: the variable assignment, the addition, the subtraction, the multiplication, and the division. As the grammar shows: one essential feature of Tac is that all computation (expressions) are atomic.

Exercise 6: Download and read through the tac.py, and make sure you understand the data structures Stmt and Function for defining the syntax of Tac. Please finish the magic method __str__ in Stmt and Function. After finishing this part of code, don't forget to test your implementation.
Exercise 7: Continue on the tac.py file, and finish the SSA conversion functions to_ssa_func() and to_ssa_stmt(). Test your implementation when finished.
Exercise 8: Continue on the tac.py file, and finish the Z3 constraint generation functions gen_cons_func() and gen_cons_stmt(). Test your implementation when finished.

Having both the source language Calc and the target language Tac, we can now implement the compiler from the source to the target. The compiler implementation is in the file compiler.py, there are roughly three functions:

Exercise 9: Finish the compiler implementation by filling in the missing code in compile_expr() in the source file compiler.py. Test your implementation when finished.

Now, we have finished the implementation of the compiler, we can implement the translation validator for the compiler. Essentially, a translation validator will check the equivalence of the constraints generated from the source program and the target program generated by the compiler.

Exercise 10: Finish the translation validator in compiler.py by filling in the missing code in the function translation_validation(). Test your implementation when finished.
Challenge: Scaling the translation validation techniques to real production optimizing compilers. You can first extend the source and target languages, by supporting more language features; then you will extend your compiler. Although the principal is essentially the same, the compiler will get complicated for supporting more language features. You can take a look at this paper to get some ideas. (Note that in different translation validators, the notion of program equivalence may be different.)


This completes this assignment. Remember to zip you homework with file name studentid-assignment5.zip (e.g SA19225111-assignment5.zip), and submit it to Postgraduate Information Platform.

Happy hacking!