Existential Types
-----------------

In this lecture, we'll discuss the theory underpinnings for ADT
or objects---existential types. ADTs are essential concepts for
modern programming practice, which enable us to build various
components of a complex software system separately, and link
them together.

For instance, suppose we want to build a "point" abstraction
on two-dimension surface. Furthermore, suppose we only want
to build "point" of integers. Here is a possible interface
for this abstraction:
  
  signature POINT =
  sig
    type t

    val create: int * int -> t
    val fst: t -> int
    val snd: t -> int
  end

essentially, this interface specifies that there is some type
"t" to represent point type, and there are three operations:
the first one "create" create a new point from two integers;
while the "fst" and "snd" functions fetch the first and second
components separately. We can give the interface various
concrete implementations, for instance, we can implement
with a pair:

  structure PairPoint:> POINT =
  struct
    type t = int * int
    
    fun create (x: int, y: int): t = (x, y)
    fun fst (t: t): int = #1 t
    fun snd (t: t): int = #2 t
  end

or else we can implement it with an array with length 2:
  
  structure ArrayPoint:> POINT =
  struct
    type t = int array
    
    fun create (x: int, y: int): int array = 
        let val arr = Array.array(2, 0)
            val _ = Array.update (arr, 0, x)
            val _ = Array.update (arr, 1, y)
        in  arr
        end
    fun fst (arr: int array): int = Array.sub(arr, 0)
    fun snd (arr: int array): int = Array.sub(arr, 1)
  end

obviously, there may be infinite many possible implementations.
So, what's the type for the signature "POINT"? We can think that
the signature "POINT" consists of a tuple of an unknown type "t"
along with a group of operations on the type "t". With this
idea, we can write this signature as:

  \exists X.{create: int*int->X, fst:X->int, snd:X->int}

or (equivalently, in the text's natation):

  \exists {*X, {create:int*int->X, fst: X->int, snd:X->int}}

in the next, I'll be using the former notation.

Given the existential type definition, the concrete
implmentation, such as the "PairPoint" or the "ArrayPoint"
can be though of some packers to pack a concrete implementation
type and its code. We can use the following syntax:

  pack \exists.{T, e} as \exists.{X, T'}

the idea here is that we can pack the concrete type T and the
code e into an existential type \exists.{X, T'} where X is a
type variable. Essentially, we are hiding the implementation
type along with some type annotations in e's type. To make
this point concrete, let's consider the "PairPoint" structure:

  pack \exists.{int*int, {create=..., fst=..., snd=...}}
    as \exists{X, {create:int*int->X, fst:X->int, snd:X->int}}

it's worth remarking that the existential type is not unique,
for instance, we can also write the following code for the 
above abstraction:

  pack \exists.{int*int, {create=..., fst=..., snd=...}}
    as \exists{X, {create:int*int->int*int, fst:X->int, snd:X->int}}

here, we are exposing the internal representation type.

Now, let's pack the "ArrayPoint" structure into an existential
type:

  pack \exists.{int array, {create=..., fst=..., snd=...}}
    as \exists{X, {create:int*int->X, fst:X->int, snd:X->int}}

as this code shows, two different code can be packed into the
same existential type.

Let's write some client code to program upon the given "POINT"
interface:

  unpack {X, x} = pack \exists.{int*int, {create=..., fst=..., snd=...}}
                  as \exists.{Y, {create:..., fst:..., snd:...}}
  in x.fst(x.create (3, 4))

the idea here is that we can unpack an existential value
into two components: an abstract type variable X and a
value v. Note that by keeping the type variable X abstract,
we can protect the internal data from being accessed in
illegal ways. For instance, the following code does NOT type
check:

  unpack {X, x} = pack \exists.{int*int, {create=..., fst=..., snd=...}}
                  as \exists.{Y, {create:..., fst:..., snd:...}}
  in #1 (x.create (3, 4))

And it worth trying to substitute the "PairPoint" with
"ArrayPoint", i.e.:

  unpack {X, x} = pack \exists.{int array, {create=..., fst=..., snd=...}}
                  as \exists.{Y, {create:..., fst:..., snd:...}}
  in x.fst (x.create (3, 4))

The question here is that whether or not the client may know
which implementation we are using? Intuitively, the answer
is NO, for the client does not rely on information specific
to any concrete implementations. So we can abuse the freedom
to substitute the current implementation, as long as they does
implement that interface (existential type). This is often
called a "representation independence".


---------------------------------------

We now give the formal definition for the system. The syntax
consists of the new "pack" and "unpack" constructs:

  T -> Bool | X | T->T | \exists.{X, T}
  v -> true | false | \lambda x:T.e | pack \exists.{T, v} as T'
  e -> v | if(e1, e2, e3) | x | e e
     | pack \exists.{T, e} as T' | unpack {X, x}=e1 in e2

In the type T, we have the type variable "X", and the existential
type "\exists.{X, T}" (recall that in logic, it's often written
as "\exists X.T").

The value v contains the "pack" construct "pack \exists.{T, v} as T'",
which packs a module value into an existential value. Note that
the type T is concrete, while the type T' is abstract.

The expression e now consists of introduce and elimination
operations on existential type: "pack" and "unpack".

In order to present the operational semantics rules, we define
the evaluation context E as:

  E -> [] | if(E, e2, e3) | E e | v E
     | pack \exists.{T, E} as T' | unpack {X, x}=E in e2

We define the operational semantics in term of the evaluation
context:


---------------------------------------------(E-IfTrue)
  if(true, e2, e3) -> e2


---------------------------------------------(E-IfFalse)
  if(false, e2, e3) -> e3


---------------------------------------------(E-App)
  (\lambda x:T.e) v -> [x|->T]e


--------------------------------------------------(E-Unpack)
  unpack {X, x}=(pack \exists.{T, v} as T') in e
                           ->[X|->T][x|->v]e

The last rule E-Unpack deserves some explanation, it specifies
that to unpack an existential value, we fetch its concrete
type T and its value v, bind them to X and x separately which
will be used in the body e.

Let's study one example, consider the above client code on the
"POINT" interface:

  unpack {X, x} = pack \exists.{int*int, {create=..., fst=..., snd=...}}
                  as \exists.{Y, {create:..., fst:..., snd:...}}
  in (\lambda z:X.(x.fst z)) (x.create (3, 4))
  -> [X|->int*int][x|->{create=..., fst=..., snd=...}] 
               ((\lambda z:X.(z.fst)) (x.create (3, 4)))
  -> [X|->int*int]((\lambda z:X.({create=..., fst=\lambda x:X.#1 x, snd=...}.fst z)) 
          ({create=\lambda(x:int, y:int).(x, y), fst=..., snd=...}.create (3, 4))
  -> (\lambda z:int*int.({create=..., fst=\lambda x:int*int.#1 x, snd=...}.fst z)
          ({create=\lambda(x:int, y:int).(x, y), fst=..., snd=...}.create (3, 4))
  -> (\lambda z:int*int.({create=..., fst=\lambda x:int*int.#1 x, snd=...}.fst z)) 
          (\lambda(x:int, y:int).(x, y) (3, 4))
  -> (\lambda z:int*int.({create=..., fst=\lambda x:int*int.#1 x, snd=...}.fst z)) (3, 4))
  -> {create=..., fst=\lambda x:int*int.#1 x}.fst (3, 4)
  -> (\lambda x:int*int.#1 x) (3, 4)
  -> #1 (3, 4)
  -> 3

The type system for this formal system take the form of:

  G; D |- e: T

where D is a type variable environment, just as we have discussed
before. I only present here the existential-related rules:

  G; D |- e: [X|->T]T'
--------------------------------------------------------------------(T-Pack)
  G; D |- pack \exists.{T, e} as \exists.{X, T'}: \exitst.{X, T'}

  G; D |- e1: \exists.{X, T1}     G,x:T1; D,X |- e2: T2
--------------------------------------------------------------------(T-Unpack)
  G; D |- unpack {X, x}=e1 in e2: T2

The existential introduction rule T-Pack will abstract (hide)
the concrete type T (as a type variable X). It's worthing
remarking that the "as" clause is necessary because, generally
speaking, the type checker has no way to tell how abstract the
existential type should be, so it requires the programmer to
offer some type annotations. To make this point concrete, let's
look at the "PairPoint":

  structure PairPoint =
  struct
    type t = int*int
  
    fun create (x, y): int*int = (x, y)
    fun toPair (t): int*int = t
  end

suppose we add another function "toPair" to convert a "point"
to a pair of integers. Without user-supplied annotations, the
type checker may infer the following existential type for it:

  \exists.{X, {create:int*int->X, toPair:X->X}}

which, nevertheless to say, it too abstract.


---------------------------------------
A final word on the encoding of the existential types. A basic
fact from mathematical logic is that we can encode existential
quantifiers using universal quantifiers, as:

  \exists X.T =~= \not (\forall X.(\not T))                (1)
              =~= (\forall X.(T->FALSE))->FALSE            (2)
         
it's worth remarking that the final step (step 2) makes use of
a basic equation from constructive logic:
  
  \not T = T -> FALSE

we can rewrite formula (2) into:

  \forall Y.(\forall X.(T->Y))-> Y

which encode existential using universal types. Thus we can
encode the two operations on existential types: pack and unpack.

  pack \exists.{T, e} as \exist.{X, T'} ==>  \Lambda Y.\lambda f: \forall X.(T'->Y). f [T] e
  unpack {X, x}=e1 in e2                ==>  e1 [T2] (\Lambda X. \lambda x:T1. e2)