Computing Math and Beauty

Implementing functional programming language (Part 1) - Untyped Lambda Calculus

01 Nov 2015

Writing an interpreter is an educating experience. It teaches a lot about our favourite programming languages, gives us mental models to work out quirks and geniuses in it. Though it feels like lot of work, implementing a pure functional language is not very difficult. Though making it performant might be. In this series we would try to implement a typed ML style programming language. We would also talk about some of the design decisions and alternatives for the same. Would try to keep the formalism as minimal as possible, but introduce them as and when needed. Let's get started.

Lambda calculus

Almost all FP languages have their basis on lambda calculus, which is rather simple. The computation is modelled with just one operation - function application (or substitution). Hence basic constructs are the ones which are just necessary to define and apply functions nothing more. And nothing less. Infact we could do any computation that could be done in a computer using lambda calculus, which only has a way to define function, which calls other functions.

Thus lambda expression can be inductively (recursively) defined to be either

Term variables: Just like variables in programming languages, except here they are immutable. We usually use small letters (x,y,z...) to represent term variables.
Lambda abstraction: Which is a fancy way of saying defining a function. A function in lambda calculus is usually represented as λx.M where x is any term variable and M is the body of the function which can be any lambda term (which in turn can be a term variable, application or abstraction)
Lambda application: Call a function. If you have function x and y just juxta-posing them next to one other is calling is like x y.

Examples

The identity function is defined as λx.x which takes an argument and gives back the same. We never name any functions, all functions are anonymous, which is why most programming languages call anonymous functions as lambda expressions.

Another example is (λx.x) y which is an application of y as an argument to the identity function. The result of the application is y. But what is y? Unlike programming languages lambda calculus allows unbound variables since it allows some formalizations to be much simpler.

So how do we define functions with multiple arguments? We use a technique called Currying (named after Haskell Curry). For instance if we are to define a function which ignores the second argument and gives back the first. We define a function on one argument which gives a function which takes another argument but ignores it, like., λx.λy.x

Booleans

With our informal notion of function application, I would make the claim that almost anything thats computable can be expressed this way. In order to grasp the sense of such a small and beautiful system, we would write a function for finding whether a number is even or odd.But where are numbers? booleans? In any computational model we encode them into the system. We encode numbers as binary in a system which understands binary. Similarly we encode numbers and booleans as functions here.

While choosing an encoding it would be nice if we could make some of the operations in it easier. For instance with booleans we usually make conditionals ie., choose one over other. Thus we define,

TRUE: λx.λy.x
FALSE: λx.λy.y

See TRUE chooses the first argument and FALSE chooses the second argument, which is more like if-then-else statement behaves in any programming language. We are encoding values using the most useful action we perform on them.

We would now define some useful functions on booleans now.

NOT: λb.b (λx.λy.y) (λx.λy.x)
AND: λa.λb.a b (λx.λy.y)

Incase of NOT if b is FALSE it chooses the second argument, which is TRUE, and if b is FALSE it chooses the first argument, which is FALSE

Similarly AND if a is FALSE it doesnt care about b and returns FALSE (the second argument when calling a) and if a is TRUE it returns b which can be TRUE or FALSE.

We could similarly define other boolean functions OR, XOR etc...

Numbers

Now we define numbers similarly. If n is a number the most useful operation is to repeat something n times. Hence we define numbers as

0: λf.λx.x
1: λf.λx.f x
2: λf.λx.f (f x)
.
.

Note : Applications are left associative by default so ffx means (ff)x and not f(fx).

This method of encoding is called as Church numeral (Named after Alonzo Church). So numbers are nothing but functions which take a function and an argument and applies the function n times on that argument. We can define some of the useful functions on numbers too. Like.,

SUCCESSOR: λn.λf.λx.f(n f x)

Apply f one more time after applying n times.PREDECESSOR is bit difficult. Church almost gave up on this, but finally figured out, after all it's turing complete. We are not going to define it here though, would leave it as a fun challenge. I would jump to SUM

SUM: λa.λb.λf.λx.a f (b f x)

Apply a times f on the result of applying b times f on x or using SUCCESSOR like,

SUM: λa.λb. a SUCCESSOR b

Though the right way is to replace SUCCESSOR with the expression, i used this form to explain what i mean. Similarly we can use SUM to define PRODUCT.

PRODUCT: λa.λb. a (SUM b) 0

Now we get to our original problem of defining a function which finds whether a given number is even or odd. The approach we will use is - given any number n if we alternate between TRUE and FALSE n times starting with TRUE, we can find whether the number is even or odd. Hence

EVEN: λn. n NOT TRUE

Apply NOT n times on the value TRUE. It works like this

0: TRUE
1: NOT(TRUE)
2: NOT(NOT(TRUE)
.
.

To be precise

EVEN: λn.n(λb.b(λx.λy.y)(λx.λy.x))(λx.λy.x)

That would have given an idea of the power of lambda calculus. Note that there are multiple occurences of x and y each with its own scope. We would formalize this in next part along with substitution, hich would allow us to implement a small interpreter for the same.

Implementation notes

The implementation of the interpreter for our programming language would be done with Rust. I will drop all the code in this github repository.

We need to define the Abstract syntax representation for our language, which would directly correspond to the inductive definition of lambda terms.

enum Lambda {
    Var(String),
    Abstraction(String, Box<Lambda>),
    Application(Box<Lambda>, Box<Lambda>)
}