Differentiating Datastructures

The Algebra of Datatypes (for Non-Functional Programmers)

Many programming languages support the construction of new types from old. For example in 'C' we may define

typedef struct {
        A a;
        B b;
} AtimesB;
An object of type AtimesB contains both an object of type A and an object of type B. We will use a shorthand where we write this type as A.B or simply AB. In C we can also introduce a new type as follows:

typedef union {
        A a;
        B b;
} AplusB;
An AplusB contains either an object of type A or an object of type B. Unfortunately the union type in C has a major flaw. We can't actually tell whether or not an object of type AplusB contains an A or a B. In principle, however, we can add flag that indicates which it is. In C++ the boost library provides a boost variant type that maintains this information transparently. We can use a declaration like the following:

typedef boost:variant<A,B> AplusB;
We'll carry on working with unions and pretend that they do contain a flag like a C++ variant.

As shorthand we will write this type as A+B. There is a slight subtlety with the type A+A. It's tempting to say an object of this type is clearly just an A so you might expect that A+A=A. However, you can think of a sum of two types as having two slots, one of the type of the left argument and one of the type of the right argument. Precisely one slot is filled and there is a flag to say which it is. In the case of A+A, although both slots have the same type, there is still the flag to indicate which of the two slots is filled.

Note that this algebraic notation makes sense and many of the standard rules of arithmetic have interpretations in terms of these types. For example A+B=B+A and A.B=B.A correspond to the fact that we can change the order of the members of the aboves structs and unions without making any essential difference. Consider also A.(B+C)=A.B+A.C. This also has a straightforward interpretation: an A and something that is either a B or a C is the same as an A and a B or an A and a C. Although one is a struct containing a variant and the other is a variant containing a struct it is trivial to write a short piece of code that converts one to the other. In fact, this is what we interpret = to mean - that there is a simple operation to convert the type on the left to the type on the right and vice versa. The types are isomorphic.

0 is, by definition, a type that no object can have. (Imagine in C++ making the constructor private so nobody can use it.) The total number of objects of type 0 is therefore zero. Consider the type B=A+0. An object of type B is either an object of type A or an object of type 0. By definition it can't be of type 0 so it must be of type A. So B=A+0=A. Similarly 0+A=A.

1 is a type of which only one object exists. (These are often called singleton objects.) All objects of type 1 are equal to each other. So consider the type 1+1. Either one slot or the other is filled. Whichever it is it can only contain one value, the one object of type 1. On the other hand there is the flag indicating which of the two slots contains this unique object. So an object of type 1+1 can be in one of two possible states. In other words there are two objects of type 1+1 and it's natural to name it 2. Any binary flag type is isomorphic to the type 2. Similarly we can define types corresponding to all of the integers by repeated addition of 1. The type called n, where n is a natural number, is a type of which there exist precisely n possible objects. In C the enum type corresponds to these. For example

typedef enum {
} Three;
defines a type with three possible values and hence is isomorphic to 3.

Now we can see even more correspondences with ordinary algebra. Consider the type A.1. It contains both an object of type A and the one object of type 1. As we already know what the object of type 1 is (there is only one after all) it is redundant. So A.1=A. Consider an object of type A+A. Whether it is the first or second slot that is filled we know the object contained in the slot is of type A. So an object of type A+A consists of a flag and an object of type A. So we can rewrite this as 2.A. So A+A=2.A as we might expect.

We will also use the power notation Xn as shorthand for X.X...n times...X. We find that the usual algebraic relationships hold. For example it's a straightforward exercise to see that (X+Y)2=X2+2.X.Y+Y2. In fact, Xn is nothing other than an array of n elements declared in C as:

typedef X XpowerN[n];
There is another class of types we can introduce in C - the functions. For example

A f(B);
declares a function that takes as argument an object of type B and returns an object of type A. We write this type as A^B or AB. Note that we have used the power notation to mean something different to what we used it for above where it represents arrays. It turns out, however, that these notations are completely compatible. Consider, for example, the type A3. This is the type of a function that can be evaluated on three different values and for each value it can take a value of type A. In other words we can completely specify the function by making an array containing each of the three different values. In other words A3=A3 where on the left we mean an array and on the right we mean a function. So we can use this notation without any risk of ambiguity. Note also that identities like 23=8 make perfect sense.

The Analysis of Datatypes

We have introduced 2 concrete types above: 0 and 1 and have shown how to construct other types out of them using addition, multiplication and exponentiation. However, using these types we can only construct types with finite numbers of objects. In practice we often want to work with types that have an infinity of objects. For example the type consisting of binary trees, or lists of binary flags, or even arbitrary integers. Consider for example the list. A popular definition in C for a list of objects of type A is

typedef struct {
        A head;
        Alist *tail;
} Alist;
A list consists of a head and a tail. However there is a little C trick being used here. C has a special pointer, the null pointer, and that can be used to indicate the empty list with zero elements. If we didn't have such an object we would have to represent an Alist using something like

typedef union {
        One empty;
        struct {
                A head;
                Alist *tail;
        } headAndTail;
} Alist;
Let's look at this closely. As described above we consider unions to contain information about which slot is filled. If the empty slot is filled we consider the list to be empty (in which case we don't care what the value of empty is because there is only one value it could take). If the other slot is filled then we have a non-empty list with head and tail. We can write this very succinctly as:

L = 1+A.L
So we have an algebraic equation defining L. More generally we can consider lists containing different types of objects so L is really a function of A.

L[A] = 1+A.L[A]
In C++ we'd call L[.] a template.

Now consider what forms a list can take. It can consist of a list with no elements, or a list with one element of type A, or two elements of type A and so on. On other words we can think of L[A]=1+A+A2+A3+.... (Note that we haven't defined what we mean by an infinite sum so just read this as suggestive.) We can get this in a different way. From the above equation we have

L[A]-A.L[A]=1 (We haven't defined minus but let's pretend we can do this)
L[A]=1/(1-A) (We haven't defined division either)
    =1+A+A2+A3+... (By the binomial expansion)
So amazingly, even though we have used operations that don't make much sense here, we have succeeded in extracting meaningful results. There are ways of making these arguments rigorous but that isn't my intention here. In general subtraction and division are not allowed.

Let's consider a different type: binary trees where the leaves contain objects of type A. A binary tree is either a leaf, in which case it contains an object of type A, or it contains two child trees. In other words we can write

T[A] = A+T[A]2
We might wish to work with binary trees that are just trees i.e. don't contain any data in their leaf nodes. These are objects of type T=T[1] where T=1+T2.

There is a remarkable identity we can prove about such binary trees [1]:

T7 = T6.T = T6.(1+T2) = T6+T8
   = T5+T7+T8
   = 1+T+T2+T4
   = 1+T+T3
   = 1+T2
   = T (Details form an exercise.)
In other words the type of binary trees is isomorphic to the type of arrays of 7 trees! It is possible to write a simple piece of code that maps every array of 7 binary trees uniquely to a single tree in a nice way and vice versa.

Notice that we can also compose datatypes. So using the above notation L[T[X]] is the type of lists of trees of X's.

The Calculus of Datatypes [2]

Many computer languages define the notion of a pointer. For example given a doubly linked list we can obtain a pointer to the first element and then walk back and forth through the list by following the pointers associated with each element.

In some programming languages, however, in particular the pure functional languages such as Haskell, there is no such thing as a pointer. Often this doesn't matter as algorithms that operate on lists have no need for anything like a pointer. They can be written recursively in terms of two functions: head(.) which returns the first element of the list and tail(.) which returns the list with the first element removed. Unfortunately there come times when it is more efficient to be able to specify a position in the list and walk backwards and forwards through it. We need to find a way to express a pair consisting of a list and a position in the list in a way that doesn't use pointers. An obvious choice is to represent the position simply as an integer representing a position in the list. To walk left and right we decrement and increment it. But to actually find the value at the nth position entails computing the tail of the list n-1 times.

Here's an approach: to represent a list l of type L[X] and a pointer to the nth position, use a triple consisting of (A,x,B) where x is the value of the nth element of the list (of type X), A is the sublist of l from the first up to the nth element stored backwards and B is the sublist starting at the n+1-th position and ending at the end of l. We can define left(A,x,B)=(tail(A),head(A),x::B) (where a::b means the list with a as head and b as tail). right(.) is defined similarly. Note how we can now walk back and forth in the list easily by iterating left(.) and right(.). So these triples serve nicely to replace the original list with a pointer into it. The type of these triples is X.L2. Notice the X factor in that. We can drop that by considering instead the type of 'lists with a pointer to an excised element'. This set is simply L2. So how, in general, do we get from a type F[X] to the type F'[X] coresponding to an F[X] with an X excised from it?

Consider first the datastructure Y=X. There is only one possibility, the pointer has to point at the one instance of an X in Y. So F'[X]=1.

Consider the case Y=A.X where A is another datastructure not containing an X. Since the excised X can't be in the A then there's only one place it can be. So F'[X]=A.1=A.

Now consider Y=A[X]+B[X]. The Y is either an A or a B so the excised X is either an A' or a B'. So F'[X]=A'[X]+B[X]'.

Now consider the product structure. F[X]=A[X].B[X] is a pair consisting of an A or a B each of which may contain an X. An excised X lies in A or in B. If it lies in A then that part is represented by A' and the B is untouched. i.e. we have A'[X]B[X]. Conversely we might have A[X]B'[X]. So F'[X]=A[X]B'[X]+A'[X]B[X].

Lastly consider composed types such as lists of trees. An object of type F[G[X]] is an F of G's of X's. To excise an X from this we must first choose which of the G's in the F contain the excised X and then excise an X from it. The first choice is F'[G[X]] and the second is a G'[X]. Our datastructure contains both of these and therefore is of type F'[G[X]].G'[X].

This should be looking familiar. These are the familiar Leibniz rule and chain rule for differentiating algebraic expressions.

Let's try some concrete classes. Consider the datatype that is a triple of objects of type X. In other words F[X]=X3. The derivative is 3.X2. Now consider how you would represent the excision of an X from a triple. First you'd need to indicate which of the three members had been excised. Then you need to represent the other two members of the triple. In other words 3.X2.

Now we'll try a more complex example, the list. We have

L[X] = 1+X.L[X]
so using the usual rules of calculus

L'[X] = 0+1.L[X]+X.L'[X]
      = L[X]+X.L'[X]
Let's do a suggestive computation of L'[X] from this:

L'[X] = L[X]+X.L'[X]
L'[X].(1-X) = L[X]
L'[X] = L[X]/(1-X)
      = L[X]2 (using the derivation of L[X]=1/(1-X) above)
This is exactly the result we had above. In other words it appears that the derivative procedure works with list types too.

Let's try a more complex example, the binary tree.

T[X] = X+T[X]2
T'[X] = 1+2.T[X].T'[X]
So an object of type T'[X] is either a trivial object of type 1 or a triple consisting of a binary flag, a tree and a tree with an excised X. In fact, these datastructures are well known in functional programming circles as zippers.

X.F'[X] is the second term in the Taylor expansion of F[X]. It's actually not that surprising that this represents a type with a marked element. Suppose the type F[X+D] can be expanded in the form

F[X+D] = F0[X]+F1[X].D+F2[X].D2+F3[X].D3+...
where the Fi[X] are independent of D.

Consider what we mean by F[X+D]? It's a container of type F[X] where some of the X's have been replaced by D's. It could be none, in which case it is of type F[X]. It could have one D. In that case it must be of the form F1[X].D. But an object of type F[X] with one X excised and replaced by a D is F'[X].D. So the second term is exactly what we'd expect. Similarly the third term must correspond to two distinct Xs replaced by Ds. From calculus we'd expect 2.F2[X]=F''[X] and so on. So now we know how to compute the type of an F[X] with n distinct X's excised.


[1] Seven Trees in OneAndreas Blass
[2] The Derivative of a Regular Type is its Type of One-Hole Contexts Conor McBride