A short primer on type theory
Type theory is a theory in mathematics that uses terms and types instead of elements and sets to formalize mathematics, meaning that all of mathematics can be built upon type theory (therefore replacing set theory in this regards; we say that type theory can act as a foundation of mathematics.)
Operations in type theory are formalized using another very interesting theory known as Category Theory and have interesting connections to logic via the Curry-Howard isomorphism, but these are topics for another day. Let's now give an overview of the basics of the theory.
How does it work?
Type theory builds upon the untyped lambda calculus adding in types as an extra feature. Interestingly, this apparently simple addition makes the lambda calculus a lot more interesting and powerful. Given that, understanding basic lambda calculus is a prerequisite for understanding type theory. Don't worry, I will not go into much detail here though.
Untyped lambda calculus
Lambda calculus is a rewriting system for terms. This means that a term \(t\) is successively converted - according to some specified rules of conversion - to other terms \(t_1\), \(t_2\), \(t_3\), etc... until conversion is no longer possible using the given rules (we say that we've reached the normal form.) Note that the reduction may not always halt (see the Y-Combinator to learn about recursion in the lambda calculus.) However we will not consider this in depth since the introduction of types to the lambda calculus will cause all valid terms to reach a unique normal form. So throughout this document we will assume that all terms reach a unique normal form.
Terms
Now what exactly are terms, you may ask? A term is any syntactically correct expression. The idea is that numbers, booleans, functions, etc... are terms and certain combinations of such terms as also terms. Here are some examples of terms from common mathematics (not directly applicable to lambda calculus as you will see later:) \(a + 1\), \(f(x, y)\), 3, \(f\), etc...
Formally, we say that this is a recursive definition. These kinds of definitions are very common within computer science, so you may like to read more about them.
In general, we consider atomic terms which are things like numbers, booleans and such, and compound terms, including lambda abstractions and function applications.
Lambda abstractions are actually just functions. For those with previous experience in imperative programming, lambda abstractions are sometimes called anonymous functions since they exist without needing to be named in the sense that they are different from function-like procedures in imperative programming languages. If you don't have any exposure to imperative programming, then you don't have to worry about that. Note that in the lambda calculus, functions are considered values in and of themselves and can be manipulated like any other term.
As a notation, we write abstractions as such:
λ input . output
For instance, a function that increments a number may be written as such
λ n . n + 1
, and the identity function which returns its input as is can be
written as λ x . x
.
Functions can be applied by simply writing their arguments to their right
therefore producing an application term. For instance: (λ x . x + 1) 2
is an
application term that will reduce to 3.
Notice, that if f
is a function, then f a
is an application term. This is
written in a different style that is usual in mathematics and programming, where
it would be written as \(f(a)\).
Reduction rules
The reduction rules of lambda calculus are really simple. If your term is an abstraction or atomic, then it's normal and you can't do any reductions to it. If instead your term is an application, then you must do β-reduction (pronounced 'beta reduction') to it. This simply means that you take your argument (input) and plug it in every position it should be in within the body of the lambda expression.
Atomic → Done
Abstraction → Done
Application → β-reduction
This (λ x . x + 1) 2
will reduce - via β-reduction - to 2 + 1
, so you untangle
the lambda revealing the body where every occurrence of the argument variable
(known as a bound variable) is replaced by its value.
There are two traditions to beta reduction: 1. Call-by-value: you evaluate (reduce) the argument to the function and then you apply beta reduction. 2. Call-by-name: you do beta reduction without reducing the argument, then you reduce what you get. This leads to lazy evaluation which has many interesting properties.
Example - encoding of boolean algebra:
True = (λ x (λ y . x)) False = (λ x (λ y . y)) And = (λ b1 (λ b2 ((b1 b2) False))) (and False False) reduces to (λ x (λ y . y)) which is False (and False True ) reduces to False (and True False) reduces to False (and True True ) reduces to True
Type theory
In type theory, each term has a type, and that is marked by the following judgement:
x : A
read as "x is of type A." Notice that this is an intrinsic property of x and not an extrinsic property (elements belonging to sets.) Although it might look syntactically nearly identical, it does imply a different, constructive view of mathematics. One never considers a term without its type. This is a very subtle difference and it may take some time for one to really see it.
Given this view, we define some common operations as primitives on types that allow us to construct new types out of other types. Product
The product of two types is very similar to the Cartesian product of two sets in set theory and it is defined as follows:
If x : A`
and y : B
, then the pair (x, y) : A x B
where A x B
(generally read as 'A
times B
') is the product type of A
with B
and (x, y)
is a single term made up of two sub-terms.
This means that given two terms of types A
and B
, you can always construct a
term of type A x B
. Reciprocally, given a term of type A x B
, you can always
produce a term of type A
and a term of type B
. This is done using the two
projections: π¹
which extracts the first element and π²
which extracts the
second element of a pair. Note: this π has nothing to do with the number
3.14159... and it means projection.
For (x, y) : A x B
we have
π1 (x, y) = x : A π2 (x, y) = y : B
Functions
As can be seen in the part on the untyped lambda calculus, functions are also
terms. A function with arguments of type A
and output of type B
is of type A ->
B
(may be read as 'A
to B
'.) As such, we have the following:
(λ (x : A) . (y : B)) : A -> B
There is a way to get a term of type B
out of terms of type A
and of type
A -> B
. And that's done by function application.
For x : A f : A -> B
we always have f x : B
.
As you can see, unlike the product, it is necessary to have two terms of the
correct types to get a term of type B
. Sum
The sum type (also known as the co-product, a nomenclature reminiscent of the
fact that the sum type is the dual of the product type in category theory) is
the type of terms that belong to either one of two types. This may sound
confusing as stated. The sum type is often called the disjoint union, or better
yet, the tagged union of the types. Again, this is reminiscent of set theory.
Basically, if x : A
and y : B
then x
(tagged with its origin A
) is of type A + B
(the sum type of A
and B
) and so is y
(tagged with its origin B
.)
What exactly is meant by tags, you may ask? Well, this is an example of how it's done in Haskell.
data Either a b = Left a \| Right b
In Haskell, Either a b
is how you write A + B
. You may not know Haskell, so
here's a simple explanation of what's happening. We are defining the data type
(for our intents, simply a type) Either a b
. Elements of this type look like
Left x
where x : a
or (the pipe |) Right y
where y : b
. Here Left
and Right
are
simply wrappers around the value. They could have been named Alice and Bob or
Eeny and Meeny, or just anything really, but Left
and Right
express the fact
that A
is written on the left side of Either
and B
on its right, so as not to
get confused writing the type. (although A + B
and B + A
are technically
isomorphic, they're not identical so it's essential to tell a programming
language whether it's A + B
or B + A
.)
Left
and Right
are the tags in 'tagged union.' You can now also see why it's
called a disjoint union: A
and B
stay separate using the tags.
The sum type is a very interesting and surprisingly very useful type that has
only recently got significant attention. Notice that there is no way to extract,
from an arbitrary t : A + B
, a value x : A
or y : B
, but case enumerations allow
for extractions of terms of types A or B depending on the value of t
. In
Haskell, this is done with pattern matching as follows:
extractA :: Either a b -> Maybe a extractA (Left x) = Just x extractA (Right y) = Nothing extractB :: Either a b -> Maybe b extractB (Left x) = Nothing extractB (Right y) = Just y
Notice the use of the Maybe
type here. This is because extractions can fail. The
Maybe
type is a wrapper around a type a
, such that it can allow for no values to
be returned meaning failure (Nothing
).
data Maybe a = Just a | Nothing
As mentioned in Bartosz Milewski's article on that, the Maybe
type is actually a
sum type itself. Maybe A = A + Unit
where Unit is the type of exactly one term,
which is why Maybe
is sometimes known as +1
. But we will not talk about that
today.
An interesting thing to notice is that it takes 1 term of type A x B
to get a
term of type A
and term of type B
, 2 terms of types A
and A -> B
are necessary
to get a term of type B
, and no number of terms of type A + B
is enough to
get any elements of A
or elements of B
(meaning that you are not free to choose
to retrieve an A
or a B
at will.)
This is all there is to say in this post. Keep on looking for more posts, because very interesting posts will follow!