.. title: A short primer on type theory
.. slug: a-short-primer-on-type-theory
.. date: 2019-08-23 
.. tags: 
.. category: 
.. link: 
.. description: 
.. type: text
.. has_math: true

.. raw:: html

   <style> .red {color:red} </style>
   
.. role:: red


*Type theory* is a theory in mathematics that uses terms and types instead of
elements and sets to formalize mathematics, meaning that all of mathematics can
be built upon type theory (therefore replacing set theory in this regards; we
say that type theory can act as a `foundation of mathematics
<https://en.wikipedia.org/wiki/Foundations_of_mathematics>`_.)

.. TEASER_END

Operations in type theory are formalized using another very interesting theory
known as Category Theory and have interesting connections to logic via the
Curry-Howard isomorphism, but these are topics for another day. Let's now give
an overview of the basics of the theory. 

How does it work?
=================

Type theory builds upon the untyped lambda calculus adding in types as an extra
feature.  Interestingly, this apparently simple addition makes the lambda
calculus a lot more interesting and powerful. Given that, understanding basic
lambda calculus is a prerequisite for understanding type theory. Don't worry, I
will not go into much detail here though.

Untyped lambda calculus
=======================

Lambda calculus is a rewriting system for *terms*. This means that a term
\\(t\\) is successively converted - according to some specified rules of
conversion - to other terms \\(t_1\\), \\(t_2\\), \\(t_3\\), etc... until
conversion is no longer possible using the given rules (we say that we've
reached the normal form.) Note that the reduction may not always halt (see the
`Y-Combinator <https://www.youtube.com/watch?v=9T8A89jgeTI>`_ to learn about
recursion in the lambda calculus.) However we will not consider this in depth
since the introduction of types to the lambda calculus will cause all valid
terms to reach a unique normal form. So throughout this document we will assume
that all terms reach a unique normal form.

Terms
------

Now what exactly are terms, you may ask? A term is any syntactically correct
expression. The idea is that numbers, booleans, functions, etc... are terms and
certain combinations of such terms as also terms. Here are some examples of
terms from common mathematics (not directly applicable to lambda calculus as you
will see later:) :math:`a + 1`, :math:`f(x, y)`, 3, :math:`f`, etc...

Formally, we say that this is a recursive definition. These kinds of definitions
are very common within computer science, so you may like to read more about
them.

In general, we consider **atomic** terms which are things like numbers, booleans and
such, and compound terms, including lambda **abstractions** and function
**applications**.

Lambda abstractions are actually just functions. For those with previous
experience in imperative programming, lambda abstractions are sometimes called
anonymous functions since they exist without needing to be named in the sense
that they are different from function-like procedures in imperative programming
languages. If you don't have any exposure to imperative programming, then you
don't have to worry about that. Note that in the lambda calculus, functions are
considered values in and of themselves and can be manipulated like any other
term.

As a notation, we write abstractions as such:

::

    λ input . output

For instance, a function that increments a number may be written as such 
``λ n . n + 1``, and the identity function which returns its input as is can be
written as ``λ x . x``.

Functions can be applied by simply writing their arguments to their right
therefore producing an application term. For instance: ``(λ x . x + 1) 2`` is an
application term that will reduce to 3.

Notice, that if ``f`` is a function, then ``f a`` is an application term. This is
written in a different style that is usual in mathematics and programming, where
it would be written as :math:`f(a)`.

Reduction rules
---------------

The reduction rules of lambda calculus are really simple. If your term is an
abstraction or atomic, then it's normal and you can't do any reductions to it.
If instead your term is an application, then you must do β-reduction (pronounced
'beta reduction') to it. This simply means that you take your argument (input)
and plug it in every position it should be in within the body of the lambda
expression.

- Atomic → Done
- Abstraction → Done
- Application → β-reduction

This ``(λ x . x + 1) 2`` will reduce - via β-reduction - to ``2 + 1``, so you untangle
the lambda revealing the body where every occurrence of the argument variable
(known as a bound variable) is replaced by its value.

There are two traditions to beta reduction:
1. Call-by-value: you evaluate (reduce) the argument to the function and then you apply beta reduction.
2. Call-by-name: you do beta reduction without reducing the argument, then you reduce what you get.  This leads to lazy evaluation which has many interesting properties.

Example - encoding of boolean algebra:

::

  True  = (λ x (λ y . x))
  False = (λ x (λ y . y))
  And   = (λ b1 (λ b2 ((b1 b2) False)))
  (and False False) reduces to (λ x (λ y . y)) which is False
  (and False True ) reduces to False
  (and True False) reduces to False
  (and True  True ) reduces to True

Type theory
###########

In type theory, each term has a type, and that is marked by the following
*judgement*:

::

  x : A

read as "x is of type A." Notice that this is an intrinsic property of x and not
an extrinsic property (elements belonging to sets.) Although it might look
syntactically nearly identical, it does imply a different, constructive view of
mathematics. One never considers a term without its type. This is a very subtle
difference and it may take some time for one to really see it.

Given this view, we define some common operations as primitives on types that
allow us to construct new types out of other types.  Product

The *product* of two types is very similar to the Cartesian product of two sets in
set theory and it is defined as follows:

If ``x : A``` and ``y : B``, then the pair ``(x, y) : A x B`` where ``A x B``
(generally read as '``A`` times ``B``') is the product type of ``A`` with ``B`` and ``(x, y)``
is a single term made up of two sub-terms.

This means that given two terms of types ``A`` and ``B``, you can always construct a
term of type ``A x B``.  Reciprocally, given a term of type ``A x B``, you can always
produce a term of type ``A`` and a term of type ``B``. This is done using the two
projections: ``π¹`` which extracts the first element and ``π²`` which extracts the
second element of a pair. Note: this π has nothing to do with the number
3.14159... and it means :red:`p`\ rojection.

For ``(x, y) : A x B`` we have

::

  π1 (x, y) = x : A π2 (x, y) = y : B

Functions
---------

As can be seen in the part on the untyped lambda calculus, functions are also
terms. A function with arguments of type ``A`` and output of type ``B`` is of type ``A ->
B`` (may be read as '``A`` to ``B``'.) As such, we have the following:

::

  (λ (x : A) . (y : B)) : A -> B

There is a way to get a term of type ``B`` out of terms of type ``A`` and of type 
``A -> B``. And that's done by function application.

For ``x : A f : A -> B`` we always have ``f x : B``.

As you can see, unlike the product, it is necessary to have two terms of the
correct types to get a term of type ``B``.  Sum

The sum type (also known as the co-product, a nomenclature reminiscent of the
fact that the sum type is the dual of the product type in category theory) is
the type of terms that belong to either one of two types. This may sound
confusing as stated. The sum type is often called the disjoint union, or better
yet, the tagged union of the types. Again, this is reminiscent of set theory.
Basically, if ``x : A`` and ``y : B`` then ``x`` (tagged with its origin ``A``) is of type ``A + B``
(the sum type of ``A`` and ``B``) and so is ``y`` (tagged with its origin ``B``.)

What exactly is meant by tags, you may ask? Well, this is an example of how it's
done in Haskell.

::

  data Either a b = Left a \| Right b

In Haskell, ``Either a b`` is how you write ``A + B``. You may not know Haskell, so
here's a simple explanation of what's happening. We are defining the data type
(for our intents, simply a type) ``Either a b``. Elements of this type look like
``Left x`` where ``x : a`` or (the pipe \|) ``Right y`` where ``y : b``.  Here ``Left`` and ``Right`` are
simply wrappers around the value. They could have been named Alice and Bob or
Eeny and Meeny, or just anything really, but ``Left`` and ``Right`` express the fact
that ``A`` is written on the left side of ``Either`` and ``B`` on its right, so as not to
get confused writing the type. (although ``A + B`` and ``B + A`` are technically
isomorphic, they're not identical so it's essential to tell a programming
language whether it's ``A + B`` or ``B + A``.)

``Left`` and ``Right`` are the tags in 'tagged union.' You can now also see why it's
called a disjoint union: ``A`` and ``B`` stay separate using the tags.

The sum type is a very interesting and surprisingly very useful type that has
only recently got significant attention. Notice that there is no way to extract,
from an arbitrary ``t : A + B``, a value ``x : A`` or ``y : B``, but case enumerations allow
for extractions of terms of types A or B depending on the value of ``t``. In
Haskell, this is done with pattern matching as follows:

::

  extractA           :: Either a b -> Maybe a extractA (Left x)  = Just x extractA
  (Right y) = Nothing
  
  extractB           :: Either a b -> Maybe b extractB (Left x)  = Nothing
  extractB (Right y) = Just y

Notice the use of the ``Maybe`` type here. This is because extractions can fail. The
``Maybe`` type is a wrapper around a type ``a``, such that it can allow for no values to
be returned meaning failure (``Nothing``).

::

  data Maybe a = Just a | Nothing

As mentioned in `Bartosz Milewski's <https://bartoszmilewski.com/2015/01/13/simple-algebraic-data-types/>`_ article on that, the ``Maybe`` type is actually a
sum type itself.  ``Maybe A = A + Unit`` where Unit is the type of exactly one term,
which is why ``Maybe`` is sometimes known as ``+1``. But we will not talk about that
today.

An interesting thing to notice is that it takes 1 term of type ``A x B`` to get a
term of type ``A`` and term of type ``B``, 2 terms of types ``A`` and ``A -> B`` are necessary
to get a term of type ``B``, and no number of terms of type ``A + B`` is enough to
get any elements of ``A`` or elements of ``B`` (meaning that you are not free to choose
to retrieve an ``A`` or a ``B`` at will.)

This is all there is to say in this post. Keep on looking for more posts,
because very interesting posts will follow!
