Abstract
We investigate numeral systems in the lambda calculus; specifically in the linear lambda calculus where terms cannot be copied or erased. Our interest is threefold: representing numbers in the linear calculus, finding constant time arithmetic operations when possible for successor, addition and predecessor, and finally, efficiently encoding subtraction—an operation that is problematic in many numeral systems. This paper defines systems that address these points, and in addition provides a characterisation of linear numeral systems.
Keywords
Lambda calculus Numeral systems Linearity1 Introduction
This paper is about numeral systems in the linear \(\lambda \)calculus. By a numeral system we mean a mathematical notation (in our case linear \(\lambda \)terms) for representing both numbers and arithmetic operations. By linear we mean that the \(\lambda \)terms do not copy or erase arguments. For the purpose of this paper, the arithmetic operations that we are interested in are restricted to: successor, addition, predecessor, subtraction and a test for zero.
The \(\lambda \)calculus is a model of computation that can represent both programs and data. Representing numbers is an essential step towards proving that the calculus is adequate for capturing all computable functions. When the \(\lambda \)calculus was introduced, Church [2] gave what are now known as Church numerals, which is the most wellknown numeral system. This representation supports all the usual arithmetic operations (successor, addition, predecessor, etc.), but some operations are more efficient (in terms of \(\beta \)reductions) than others. Specifically for Church numerals, the predecessor operation is costly, and we give a summary later to show that other numeral systems can encode this operation in a more efficient way, but at the cost of introducing a less efficient way of computing one of the other operations. Here we are interested in the linear \(\lambda \)calculus. This is a proper subset of the usual calculus, where explicit copying and erasing (by using variables in the body of a term more than once or not at all, respectively) are not permitted. All functions must use their arguments exactly once, and as a consequence the calculus has extremely weak expressive power—all functions normalise in linear time. In the linear case, \(\beta \)reduction is a constant time operation, so counting reductions gives a cost metric. It is important to note that if a function is definable in the linear \(\lambda \)calculus it will be constant or linear time. This makes the linear \(\lambda \)calculus an interesting framework for encoding arithmetic. However, it comes as no surprise that there is no notion of adequacy for linear numeral systems (the ability to represent all computable functions). The reason for this is that iteration and recursion are not linearly definable. This makes the definition of addition and subtraction a difficult problem that we address in this paper.

to define suitable representations (datatypes) of numerals in the linear \(\lambda \)calculus;
 and to define linear functions over these representations to compute:

successor, addition and predecessor that are constant time;

subtraction and a test for zero that are as efficient as possible.


a series of linear numeral systems, leading to a characterisation of linear numerals;

constant time successor, predecessor and addition operators;

subtraction that has cost min(m, n);

test for zero that is linear in the size of the term.
1.1 Related Work
There has been a great wealth of work since the \(\lambda \)calculus was introduced on representing data; specifically numeral systems. The \(\lambda I\)calculus [1], where abstractions must bind at least one variable was used, and several systems have been adapted to work in this way.
There are many numeral systems in the literature (and many more used as exercises in \(\lambda \)calculus courses). Apart from the most wellknown system of Church already mentioned, we find in the literature a system by Scott (first reported in [3]), and some socalled unusual ones by Wadsworth [10] (also developed by Böhm). With an emphasis on constant time operations, there is the work of Parigot and Rozière [8], which however doesn’t focus on linearity, but is the work closest to ours in spirit.
From a very different perspective, and a focus on compact representation, there are binary representations [7] that give logarithmic size numbers. However, as we show later, these are not linearly definable. There is also the work of Böhm and Dezani (see for example [9]) where numbers are represented by nonterminating \(\lambda \)terms, but again these are not possible in the linear \(\lambda \)calculus.
Rezus [9] and Barendregt [1], contain several other systems that we will not enumerate here, where one of the main concerns is adequacy. Linear systems are never going to be adequate, but for us the focus is on time efficient representations. We provide a selection of systems. Some are closely based on existing systems (for example those defined by Rezus [9]), adapted to a linear setting, and some other ones are new.
1.2 Overview
The rest of this paper is structured as follows. We first recall some background concepts for the \(\lambda \)calculus and numeral systems. In Sect. 3 we discuss ways of representing numbers generally, so that they can be applied to our setting. In Sect. 4 we define some linear numeral systems, specifically one of the main contributions of the paper: a linear system that allows constant time operations as well as subtraction. In Sect. 5 we give a characterisation of linear numeral systems. We discuss these systems and conclude in Sect. 6.
2 Background
We assume familiarity with standard \(\lambda \)calculus concepts, including notions of free variables (\(\text {FV}\)), substitution (t[u / x]), \(\beta \) and \(\eta \)reduction. We refer the reader to [1, 5] for additional background to these. To fix the syntax we start with a definition.
Definition 1
 1.
\( x \in {\mathscr {X}} \Rightarrow x \in \Lambda _L\) (variable)
 2.
\(t \in \Lambda _L, x \in \text {FV}(t) \Rightarrow (\lambda x.t)\in \Lambda _L\) (abstraction)
 3.
\(t, u \in \Lambda _L, \text {FV}(t)\cap \text {FV}(u) =\varnothing \Rightarrow (t u) \in \Lambda _L\) (application)
In this definition, the free variable constraints ensure linearity of terms by construction. Application associates to the left and abstraction binds as far to the right as possible. Consequently we will economise on parentheses whenever possible. Examples of linear terms include: \(I=\lambda x.x\), \(B=\lambda xyz.x(y z)\), \(C=\lambda xyz.x z y\), and function composition: \(t {\circ } u = B t u\). Notable nonlinear terms are \(S=\lambda xyz.xz(yz)\) and \(K=\lambda xy.x\), which incorporate explicit copying and erasing of arguments respectively.
The \(\lambda I\)calculus [1] is an intermediate calculus that does not allow erasing, but does allow copying (this is analogous to relevance logic), and we could define an affine calculus that allows erasing but not copying. Thus the linear \(\lambda \)calculus is the most constrained calculus with respect to resource usage, and related to linear logic.
In this system the contexts \(\Gamma \) and \(\Delta \) are used to guarantee the linearity constraints. Types will not play much role in the following, but it is worth remembering that all linear terms are typeable.
Definition 2
Linear \(\beta \)reduction is a constant time operation (there are implementations where substitution has no cost). The number of \(\beta \)reductions is therefore a reasonable measure of the cost of a computation. \(\eta \)reduction does not need a sidecondition, as there cannot be another occurrence of x in t. Linear terms are stable under \(\beta \) and \(\eta \)reduction: if \(t\in \Lambda _L, t \rightarrow u\) then \(u \in \Lambda _L\). We occasionally mention \(\eta \)reduction, but all reductions will be \(\beta \)reductions unless explicitly labelled otherwise. The reflexive, transitive closure is denoted \(\rightarrow ^*\), and we write \(t \rightarrow ^n u\) meaning that t reduces to u in exactly n steps.
If no reduction can take place in a term, then the term is a normal form. Reduction in the linear \(\lambda \)calculus is strongly normalising and confluent. Consequently, all terms have a normal form (and therefore also a head normal form). Moreover, all terms are (simply) typeable, and reduction preserves types.
We can now show a number of properties of terms and reduction that will be of use later. First, it is useful to characterise terms in the following way.
Lemma 1
 1.
Each term is a variable, an abstraction term or an application term.
 2.
Each application term is of the form \(t_1 \ldots t_n\), where \(n\ge 2\) and \(t_1\) is not an application term.
 3.
Each abstraction term is of the form \(\lambda x_1 \ldots x_n.t\), where \(n\ge 1\) and t is not an abstraction term.
As an almost direct consequence of Lemma 1, we have the following result about the shape of normal forms. This is standard result, but it is important for later results so we provide some additional details here:
Lemma 2
Proof
Since t is closed it cannot be a variable. Neither can it be an application term because if it were, then \(t_1\) in \(t_1\ldots t_n\) would have to be an abstraction, and thus there is a redex contradicting that this is a normal form. Thus t must be an abstraction term, say \(\lambda x_1\ldots x_n. u\), \(n\ge 1\). Since u cannot be an abstraction it must either be a variable in which case it must be one of \(x_i, 1\le i \le n\), or it is an application term \(v_1\ldots v_m\), \(m\ge 2\). Now \(v_1\) must be a variable, which again must be one of the \(x_i, 1\le i \le n\), because if it were an abstraction it would not be a normal form. Thus the only possibilities are instances of the above. \(\square \)
A convenient metric on linear terms is the size of the term.
Definition 3
We can now glean some insight into the structure of linear terms, and monitor the structure (and consequently the size) of a term under reduction. The following properties are key to our ideas and have a lot of say in the following about the representation of numeral systems and operations that can be encoded.
Lemma 3
 1.
Let t be a closed linear term, then \({t} = 3k+2\) for some \(k\ge 0\).
 2.
For any linear terms t and u, where \(x\in \text {FV}(t)\), \({t[u/x]} = {t}+{u}1\).
 3.
If \(t\rightarrow u\) then \({t} = {u}+3\). Consequently, if \(t \rightarrow ^n u\), then \({t} = {u}+3n\).
Proof
 1.
We prove a stronger result: if t is a linear term with \(\text {FV}(t) = \{x_1,\ldots , x_i\}\), then \({t} = 3k+2i\). We proceed by induction on the structure of t. If t is a variable, say x then we have \(i=1\) and the result follows by setting \(k=0\). If t is an application, say (uv), where \(\text {FV}(u) = \{x_1,\ldots , x_m\}\) and \(\text {FV}(v)=\{y_1,\ldots ,y_n\}\), then by the induction hypothesis twice, we have \({(u v)} = 3(k_1 +2  m) + 3(k_2 + 2n) + 1\), which we can reorganise as \(3(k_1 +k_2 + 1) + 2  (m+n)\) as required. Finally, if t is an abstraction, say \(\lambda x.u\), where \(\text {FV}(u) = \{x,x_1,\ldots , x_n\}\), then by the induction hypothesis we have \({(\lambda x.u)} = (3k+2(i+1))+1\), which simplifies to \(3k+2i\) as required. The result follows because when t is closed, \(i=0\).
 2.
The unique occurrence of x in t has size 1 which will be replaced by u.
 3.
Each \(\beta \)reduction \((\lambda x.t)u \rightarrow t[u/x]\) removes an application, abstraction and a variable. Thus, \({(\lambda x.t)u} = {t} + {u} +2\), and \({t[u/x]} = {t} +{u}1\). Note that the same is true for \(\eta \)reduction.
As a corollary, reduction is easily seen to be strongly normalising. This result tells us something very important about representing numbers in the linear \(\lambda \)calculus, and gives us some hints and a start in the process of characterising linear numeral systems that we will return to later in the paper. In particular, it tells us that any encoded number must be proportional in size to the number it represents, and that operations such as addition must be constant in time. Thus compact representations, such as binary numerals, cannot be defined as each successor cannot be represented by a constant operation. We will return to this later in the paper.
We recall now two standard notations for repeated applications.
Definition 4
An important structure that will be used throughout the paper is the following representation of a list of terms, suitably adapted to the linear \(\lambda \)calculus.
Definition 5
(Linear sequences) Let \(t_1, \ldots , t_n\) be linear terms, then \({\langle }{t_1,\ldots , t_n}{\rangle } = \lambda z.z t_1\ldots t_n\) is a sequence of linear terms.
In particular, we note that the empty sequence is \({\langle }{\; }{\rangle } = \lambda z.z\), and the singleton is \({\langle }{t}{\rangle } = \lambda z.z t\). Function composition (\(t \circ u = B t u\)) gives concatenation of sequences in four \(\beta \)reductions:
Lemma 4
\({\langle }t_1, \ldots , t_n {\rangle } \circ {\langle } u_1, \ldots , u_m {\rangle } \rightarrow ^{4} {\langle } u_1, \ldots , u_m, t_1, \ldots , t_n {\rangle }\)
If we were interested in preserving the order of the elements in the sequence, then we can define a variant of this using \(B' = \lambda xyz.y(x z)\) rather than B. However, we will see that the use we make of this result always joins sequences of the same elements, so the results are equivalent.
The following is a standard definition of a numeral system that we have adapted for the linear case. We restrict to the case where each numeral is a term in normal form, as the linear \(\lambda \)calculus is terminating.
Definition 6
 1.
each \(d_n\) is a closed term in normal form;
 2.
there exist terms S (representing successor), and Z (representing the test for zero) and two different terms T and F (representing the Booleans true and false), that satisfy \(S d_n \rightarrow ^*d_{n+1}\), \(Z d_0 \rightarrow ^*T\) and \(Z d_{n+1} \rightarrow ^*F\). Furthermore, we call a numeral system linear when:
 3.
each \(d_n\) is a closed linear term;
 4.
the terms S, Z, T and F are linear;
 5.
addition, predecessor and subtraction are all linearly definable.
The first two points in the above definition are enough to give an adequate numeral system in the full \(\lambda \)calculus, but we will need to use iteration or recursion to build the arithmetic functions. Because neither iteration nor recursion are linear definable, it is not possible to get an adequate system. Therefore we have chosen to include in the definition of linear numeral system the requirement to support linear addition, predecessor and subtraction. As a consequence several systems that we put forward in this paper are not linear numeral systems according to this definition, and we will call those linear systems ‘candidates’. Therefore, in some candidate systems we present, the numbers may be represented by linear terms, but the arithmetic functions would be nonlinear terms.
To summarise, in this paper we are interested in the following arithmetic operations: successor, addition, predecessor, subtraction and test for zero, because these are linearly definable.
3 Representing Numbers
We can represent numbers as algebraic terms (i.e., first order syntax). But the introduction of binders (i.e., higher order) gives some new ideas. We briefly recall some of these ideas in this section, and then review some wellknown numeral systems in the (nonlinear) \(\lambda \)calculus.
3.1 Numbers
What about the predecessor and subtraction functions? These bring additional issues. Consider first the predecessor: \(\mathsf {pred}\; 0 = 0\), \(\mathsf {pred}\; (S n) = n\). With this definition, we can show \(\mathsf {pred}(S x) = x\), but we cannot simplify \(S(\mathsf {pred}\; x)\). This is a problem for constructor systems, and also an issue for us. From these examples we conclude that constant time functions should be possible in some cases.
If addition is like concatenation of lists, then subtraction is like splitting a list. Consequently part of the list will need to be erased, which is not straightforward in a linear system.
There are other ways to represent numbers as \(\lambda \)terms, for example binary representations (see for example [7]). However, the results of this paper show that these representations are not possible in the linear \(\lambda \)calculus, and we discuss this topic in more detail in Sect. 5. We next review four wellknown numeral systems in the \(\lambda \)calculus. None of these are linear but we will refer to them later for inspiration.
We assume for the rest of this section the usual (nonlinear) \(\lambda \)calculus, and standard Booleans: \(T = \lambda xy.x, F=\lambda xy.y\). We write \(\mathsf {succ}\), \(\mathsf {add}\), \(\mathsf {pred}\), \(\mathsf {sub}\), and \(\mathsf {zero}\) to represent the successor, addition, predecessor, subtraction and test for zero.
3.2 Church Numerals
Church numerals [2] encode numbers with repeated application: \(\lambda xf.f^n x\). Thus, \(d = \lambda xf.x, \lambda xf.f x, \lambda xf.f(f x), \ldots \). Each number is an iterator, where a binder represents the iterated term, and a second binder is a placeholder at the end of the list of applications. Consequently, Church numerals can be seen as lists with a pointer to the end of the list, and thus we can predict that an efficient addition should be possible in this representation as discussed in Sect. 3.1

\(\mathsf {succ}\ (\lambda xf.f^n(x)) \rightarrow ^* \lambda xf.f(f^n (x))\)

\(\mathsf {succ}'\ (\lambda xf.f^n(x)) \rightarrow ^* \lambda xf.f^n(f x)\)
This operation is commutative, so \(\mathsf {add}\; a\; b = \mathsf {add}\; b\; a\) and moreover the result is obtained with the same number of reductions. There are other alternatives for predecessor, for instance using pairs, but we will need iteration or recursion for this one. Church numerals are interesting because a wealth of functions can be defined without using recursion since they have iteration built in. They offer a compact representation, and operations like addition can be efficient (constant time). But on the down side, predecessor, and therefore subtraction, are very expensive.
3.3 Wadsworth Systems/Böhm Systems
3.4 Standard Numerals
3.5 Scott Numerals
Scott numerals [3], like standard numerals, have an efficient predecessor operation. In addition, although the terms are not linear, they do not duplicate variables.
3.6 Summary

Each numeral system presented has a good feature, for example constant time addition (Church), constant time predecessor (Scott, Standard), efficient subtraction (Standard).

Each system presented also has a bad feature: Scott numerals need recursive function to encode addition, and the predecessor in Church numerals (and therefore subtraction) is a very complicated and expensive operation.

Many of the systems cannot encode an operation without the need for recursion machinery. Church numerals have the advantage here, as they have a builtin iterator.

No system presented is linear, and no system has subtraction.
4 Linear Numeral Systems
We now limit ourselves to the linear \(\lambda \)calculus, and give several possible representations of numbers. Linear numeral systems are not new—some of the earliest systems were linear, and this is because of the popularity of the \(\lambda I\)calculus. However, the novelty here is that we want to characterise linear systems, and in particular look at efficient ways of computing test for zero, and also subtraction. Before we can define any, we need some ideas to simulate erasing and to represent Booleans.
4.1 Linear Erasing and Booleans.
 1.
Erasing by consuming. A linear \(\beta \)reduction erases a \(\lambda \), an application and a variable, so the size of the term reduces by 3, as shown previously. If we organise our data accordingly, we can use this idea to erase terms. For a trivial example consider \((\lambda x.x)(\lambda y.y)\). This term reduces to \(\lambda y.y\), where \(\lambda x.x\) has been consumed (erased). The challenge therefore is to see if we can do this for any term t: can we find a context \(C[\,]\) such that \(C[t] \rightarrow ^*I\), thus erasing t?
 2.
Erasing by pairing with garbage. We can simulate the nonlinear term \(K=\lambda xy.x\) by first defining the linear term \(K' = \lambda xy.{\langle }{x,y}{\rangle }\). When we apply this to two arguments, it reduces to a pair: \(K' t u \rightarrow ^2 {\langle }{t,u}{\rangle }\). We can think of this as the result, t, together with the garbage that has yet to be erased, u. We can extend this idea so that we can accumulate all the garbage until the end of the computation, and the answer we require is the first component. This approach has previously been studied by Klop [6].
The first approach requires a lot of reduction steps, but can be justified if we consider that this is doing nothing other than explicit garbage collection. Erasing by consuming is based on the following result, which is a variant of solvability. It states that a simple applicative context of identity functions will always cause a closed linear term to reduce to the identity function I. This gives an important observation that linear terms are solvable by linear contexts:
Lemma 5
(Erasing) Let t be a closed linear term. There exists a number n such that \(t I^{\sim n} \rightarrow ^*I\). Moreover, there is a least such number n.
Proof
 1.
If \(n>1\) then we can construct the term \(tI^{\sim n}\). Note that \({tI^{\sim n}}= {t}+3n\). Now there are n redexes, so this term can reduce to \(I t_1' \ldots t_m'\), where \(t_i'\) is either \(t_i\) or \(t_i\) with a free variable substituted for I. By Lemma 3, Part 3, \({I t_1' \ldots t_m'} = {t}\). If we now reduce the head redex we get a term that is strictly smaller than t. We can normalise it and conclude by induction.
 2.
Otherwise, \(n=1\) and \(m>0\) and we can construct tI, which reduces in one step to \(I t_1 \ldots t_m\). If we now reduce the head redex we have a smaller term again. We can normalise it and then conclude by induction.\(\square \)
We give an example. Let \(t=\lambda x.x I I I I\), then this term can be erased by building \(tI^{\sim 1}\), i.e., by using the context \(C[\;] = [\;]I\). It is then easy to see that \(C[t] = (\lambda x.x I I I I) I \rightarrow ^*I\). We will say there is a \(\beta \)collapse for a term t if there is a context \(C[\;]\) that causes the term t to reduce to the identity. Once we have the context, since the the size of the term is known, we know how many steps are needed to normalise it.
Using this idea to cause a \(\beta \)collapse, we can define some data types. For example, inspired by the encoding of Booleans in the \(\lambda I\)calculus [1], we can define linear Booleans.
Definition 7
(Linear Booleans) For \(n\ge 0\), we define \(B_n = (T_n,F_n)\), where \(T_n = \lambda x y. y I^{\sim n} x\) and \(F_n = \lambda x. x I^{\sim n}\).
We remark that \(F_n = \lambda x. x I^{\sim n} =_\eta \lambda xy. x I^{\sim n} y \). Using Lemma 5, we have:
Lemma 6
For linear \(\lambda \)terms t and u, and for some \(n\ge 0\), if \(t I^{\sim n} \rightarrow ^* I\) and \(u I^{\sim n} \rightarrow ^* I\), then if \(m\ge n\), then: \(T_m t u \rightarrow t\) and \(F_m t u \rightarrow u\).
Thus, if we have enough Is, then we can erase the part of the conditional that is not needed in the result. We now have all we need to start defining linear numeral systems.
4.2 Numbers
We begin by presenting two numeral systems. The first one is based on Church sequences. Some of the ideas for this system are due to Böhm, and studied in detail in Rezus [9]; the main difference is that we need all the operations to be defined as linear terms. The second system is a variant of Scott numerals where we have access to the end of the list. We examine the shortcomings of each system, then conclude the section by introducing a third system which is a combination of the first two, where we are able to define all the operations.
In the following we write S for successor, A for addition, P for predecessor, M for minus (Subtraction), and Z for testforzero. We also annotate the names of the functions with underlining and overlining to distinguish between the first two systems, for example \(\overline{S}\), \(\underline{S}\), etc.
4.3 First Candidate
It is easy to see that each numeral is represented by a linear term. We can define the following linear arithmetic operations:
Definition 8
There are some minor variants possible for some of these operations, for instance: \(\overline{S'} = \lambda xy.x(y I)\) and \(\overline{A'} =\lambda xyz.y(x z)\). However, they have no impact whatsoever on the efficiency of the system (the same number of reductions are needed in each case). The key observation is that a number is represented by a chain of applications with access to the first and last element, which is why there are two variants for successor: adding to the left or the right handside of this chain of applications, in a similar way as shown for Church numerals in Sect. 3.2.
Proposition 1
Proof
One nice aspect of this system is the test for zero: any number will collapse to the identity function when applied to just one I: \(\overline{n} I \rightarrow ^*I\), so the erasing of the number is easily simulated.
Corollary 1
 1.
\(\overline{S}\), \(\overline{A}\) and \(\overline{P}\) are all constant time operations.
 2.
\(\overline{Z}\) is not a constant time operation (there exists no constant time \(\overline{Z}\) since the number must be erased).
Although this system meets many of our requirements, it fails on one aspect: subtraction. The situation is not recoverable because of Lemma 5:
Proposition 2
There is no linear term \(\overline{M}\) such that \(\overline{M}\; \overline{m}\;\overline{n} \rightarrow ^*\overline{mn}\).
Proof

If \(i=1\), then \(\overline{M} = \lambda x.xt_1 \ldots t_k\), and by linearity each term \(t_1, \ldots , t_k\) must be closed. Now \(\overline{M} \; \overline{m}\;\overline{n} \rightarrow ^*\overline{m} t_1 \ldots t_k \overline{n}\), and my Lemma 5, \(\overline{m} t_1 \rightarrow ^*I\), so \(\overline{m}\) is lost and thus M where \(i=1\) cannot compute subtraction.

If \(i=2\), then there are two cases: \(j=1\) or \(j=2\). If \(j=1\), then \(\overline{M}\) is the term \(\lambda x_1 x_2.x_1 t_1 \ldots t_l \ldots t_k\), where \(a \le l \le k\), and by linearity each term \(t_1, \ldots , t_l, \dots , t_k\) must be closed, with the exception of \(t_l\) that has the free variable \(x_2\). Now \(\overline{M} \; \overline{m}\;\overline{n} \rightarrow ^*\overline{m} t_1 \ldots t_l[\overline{n}/x_2] \ldots t_k\). The term \(t_l\), being a normal form, must be of the shape \(x_2 u_1 \ldots n_p\), where \(u_1\) is closed, thus \(\overline{n}\) is lost and thus \(\overline{M}\) where \(i=2\) and \(j=1\) cannot compute subtraction. The case for \(j=2\) is similar.

If \(i=3\), then there are three cases to consider (\(1\le j \le 3\)) which are all similar to the previous case.\(\square \)
Interestingly, there is a linear term \(\overline{M_n}\) such that \(\overline{m} \circ \overline{M_n} \rightarrow ^*\overline{mn}\). This is given by \(\overline{M_0} = \lambda x.x\) and \(\overline{M_{n+1}} = \lambda yx.x(\overline{M_n} y)\). This has a cost in the size of n, and it is one of the very best subtractions for numeral systems. However, it is not so useful here since we cannot construct the term \(\overline{M_n}\) from \(\overline{n}\), as shown in the previous result, with a linear function. We will come back to this result later in the third attempt at building a linear numeral system.
Remark 1
All the linear systems that we define can also be expressed using the linear combinators C, B and I. For example, the system above can be written as: \(\overline{0} = I\), \(\overline{1} = CII\), \(\overline{2} = C(CII)I, \ldots , \overline{n+1} = C \overline{n} I\). Then each of the linear operators can be written using these combinators, for example \(\overline{S} = C(B C(B I))I\).
4.4 Second Candidate
Our second attempt to build a linear numeral system is based on alternating abstractions and applications. This system builds terms in a similar style to Scott numerals in fact, but has the pointer to the end. This way of representing numbers was also found by Parigot and Rozière [8], although linearity was not a concern in their case, so the operations are different. In this system there is no way to erase a number however, so the test for zero will have to be simulated in a specific way using pairs as discussed earlier, in Sect. 4.1. We will harness this idea in the next system to allow numbers (and pairs) to be erased.
Here we can see that each successor is represented by an abstraction, a variable and an application (\(\lambda x.x\), where − is the rest of the number represented as a list. Just like in the previous linear system, we have access to the first and last element of the list, so we can add to either end of this list of successors. Each numeral is represented by a linear term, and the following operations are also linear:
Definition 9
Proposition 3
Proof
All cases shown by straightforward \(\beta \)reductions. \(\square \)
\(\underline{Z}\) in this system does not meet our requirement fully, since the result is a pair. It allows us to simulate the test for zero, and if we had enough I’s, then we would be able to consume the number. It does however meet all the other requirements, and in particular, we now have subtraction. The situation is again not recoverable:
Proposition 4
There is no linear term \(\underline{Z}\) such that \(\underline{Z}\; \underline{n} \rightarrow ^*T_2/F_2\)
Proof
Wadsworth [10] has shown that in any numeral system the test for zero has a specific form: \(\underline{Z} = \lambda x_1\ldots x_i .x_1t_1 \ldots t_k\), where \(i\ge 1\), \(k\ge 0\), and the principal variable is the first bound variable. So \(\underline{Z} \underline{n} \rightarrow ^*\lambda x_2 \ldots x_i.\underline{n}t_1\ldots t_k\). Since \(\underline{n}\) is of the form \(\lambda yx.x(\lambda x.x(...xy)...)\), there are \(n+1\) abstractions, so if \(k< n\) part of n remains untouched. Since \(\underline{n}\) cannot be known in advance it will not reduce to the representation of a Boolean. \(\square \)
Corollary 2
 1.
\(\underline{S}\), \(\underline{A}\) and \(\underline{P}\) are all constant time operations.
 2.
The test for zero is linear, but we have a pair as a result.
4.5 Third Candidate
Proposition 5
 1.
If \(m = n\) then \(\overline{m} \circ \underline{n} \rightarrow ^*I\)
 2.
If \(n < m\) then \(\overline{m} \circ \underline{n} \rightarrow ^*\overline{mn}\)
 3.
If \(m < n\) then \(\overline{m} \circ \underline{n} \rightarrow ^*\underline{nm}\)
Proof
 1.By induction on n. When \(n=0\), \(I \circ I \rightarrow ^*I\) (base case). Next assume \(\overline{k} \circ \underline{k} \rightarrow ^*I\), we show the case for \(k+1\).Now \(\overline{k} \circ \underline{k} \rightarrow ^*\lambda z.\overline{k}(\underline{k} z))\), and so by the induction hypothesis, \(\lambda z.\overline{k}(\underline{k} z)) \rightarrow ^*I\) as required.$$\begin{aligned} \begin{array}{lcl} \overline{k+1} \circ \underline{k+1} &{}= &{}\lambda z.(\lambda x.\overline{k}(x I))((\lambda yx.x(\underline{k} y))z) \\ &{}\rightarrow &{}\lambda z.(\lambda x.\overline{k}(x I))(\lambda x.x(\underline{k} z)) \\ &{}\rightarrow &{} \lambda z.\overline{k}((\lambda x.x(\underline{k} z))I) \\ &{}\rightarrow &{} \lambda z.\overline{k}(\underline{k} z)) \end{array} \end{aligned}$$
 2.The other two cases are a consequence of the following three results which can be shown is the same way as the inductive step above.
 (a)
\(\overline{m+1} \circ \underline{n+1} = \overline{m} \circ \underline{n}\)
 (b)
\(\overline{mn} \circ \underline{0} = \overline{mn}\)
 (c)
\(\overline{0} \circ \underline{nm} = \underline{nm}\) \(\square \)
 (a)
This result gives a way to erase fully in the second system, but more importantly, it gives a way to compute subtraction in min(m, n). We harness this idea to build a third numeral system that is a combination of the previous two. We do this by building pairs of numbers. Actually, representing integers using pairs is not a new idea. A number n is a pair (a, b) which we interpret as \(ab\). It is then possible to define operations over these pairs, for example: \((a,b)+_p(c,d) = (a+c,b+d)\) and \((a,b)_p(c,d) = (a+d,b+c)\). Thus we can add two pairs (\(+_p\)) using addition on numbers, and we can also subtract pairs (\(_p\)) using addition. We will make use of a variant of this idea and build the pair \((n,n)\), where each number will be represented from a different numeral system. The operators for the first two systems together with Proposition 5 then allow all the functions to be computed. We remark however that storing the number twice means that there is some redundancy in the system. It is also worth pointing out that Wadsworth’s system needed two copies of the number for zero test. The same is true for this system, but we are in a linear framework so need to keep hold of the second number, and no \(\eta \)collapse is needed.
Definition 10
Note that we use Z from the second system here, and we now have enough I’s to fully erase the terms from the first system.
Proposition 6
Proof
Corollary 3

Successor, addition and predecessor are all constant time operations.

Test for zero is a linear time operation (it requires that we erase fully a number to give a Boolean result).

The subtraction function requires min (m, n) reductions to compute \([\![ {mn} ]\!]\).
5 Characterising Linear Numeral Systems
Thus, we have the same successor and zero as the second candidate, but constructed differently. In this representation, the following operations are also linear:
Definition 11
We remark that successor, predecessor and subtraction are beautifully simple, and we can encode the test for zero with pairs as done before. However, this time there is no addition.
Proposition 7
There is no linear term A such that \(A\; m\; n \rightarrow ^*m+n\).
The reason for this is the same as why there is no constant time addition for the algebraic data type for numbers given earlier, or similarly for the concatenation of linked lists: in all cases, we need to traverse the first number (list) to join it to the second one. Of course, not being able to do addition is really a failure of this system, but it can be defined outside the linear framework (by using an encoding of recursion, as for Scott numerals for instance).
We have now collected a number of “failed” linear numeral systems, and we can start to ask which systems therefore do have addition, which have predecessor, which have subtraction, etc., and then we can try to understand when this arises.

each \(\lambda \) binds one occurrence of a variable;

each application has disjoint free variables;

each variable is bound by a \(\lambda \).

Zero is represented as a closed linear term of size c, where \(c=2+3k\) for some k. \(k=0\) in all the systems we have presented in this paper so far, because the representation of zero is just \(\lambda x.x\), which has size 2.

Each successor causes the size to increase by \(3k'\), where \(k'\ge 1\) is the number of abstractions (or variables or applications) used. The smallest value of \(k'\) is 1, which we have used several times already for the first, second and fourth candidates at a linear numeral system. For the third system, we used \(k'=2\).

The size of a numeral n is therefore given by: \(2+3k+3nk'\), for some \(k\ge 0, k'\ge 1\).
There are however additional choices. In the rightmost two configurations above where there are two edges connecting the abstraction and application nodes we can split one of them to make alternative constructions. There is not much choice: we cannot have a list of abstractions as there is no way to erase. We may try to build systems like this for example:
Some of the above are useful, if we pair them with the first system  this can erase the abstraction, but none of these can lead to a linear numeral system. We can exhaust all possibilities, and establish that there is no linear numeral system when \(k=0\), \(k'=1\), thus the first is with characteristic values \(k=0\), \(k'=2\).
The following proposition summarises the points above.
Proposition 8

The size of the term representing the number n is proportional to n. Specifically, n would be represented by a term of size \(nk + c\).

Successor, addition and predecessor will be constant time operations.

Test for zero and subtraction will be linear time operations.
Proof

Let c be the size of the representation of zero. Each successor must add the same size term, say k, which gives the result immediately.

The successor S of a number n will be built from an application Sn, where \({Sn} = {S} + {n} + 1\). Since each \(\beta \)reduction reduces the size of a term by 3, we know that the extra structure needed to represent \(n+1\) must come from the term S. Thus not only must this be a constant number of \(\beta \)reductions, but we know the size of the term S also. The same reasoning applies to the addition and predecessor operations.

Erasing a term can never be a constant time operation and therefore must be linear. Subtraction and test for zero both need to erase part of a number of a whole number.\(\square \)
Because of this characterisation, there cannot be a more compact representation of numbers in the linear \(\lambda \)calculus. In particular, we cannot represent a linear version of binary numerals (see for example [7]) for instance.
6 Conclusion
One of the main results in the \(\lambda \)calculus is that numerals can be defined, together with successor, predecessor and test for zero functions, and the system is adequate to represent all computable functions. The numeral system presented in this paper can be understood as numeral systems in the usual \(\lambda \)calculus, and replacing the linear Booleans by the standard ones, immediately eliminates the inefficiency issues associated with tests for zero. In this way, all the systems defined here are adequate. Building numeral systems in a constrained calculus allowed us to get insight into numeral systems generally. Interestingly the linearity constraints imply that we are limited in choices but also ensure that the operations are efficient: it is not possible to define a nonconstant time addition for instance.
We have shown that the linear \(\lambda \)calculus can be used to represent numbers in an efficient way. Specifically, successor, predecessor and addition are all constant time functions, and moreover they have to be constant time. Subtraction, frequently omitted in numeral systems, and especially cumbersome in Church style representations, can also be done efficiently. The graphical representation of linear terms has been useful to identify new representations.
One aspect of this work, and indeed one of the reasons for investigating linear numeral systems in the first place, was to find a more efficient representation of predecessor and subtraction operations for System F [4] by potentially finding new ways to represent the data. All linear terms are typeable, so all numbers, and all operations we have presented in this paper also have a type. However, each number has a different type—recursive types are therefore needed—so this work does not offer any improvement for System F. We leave this aspect however for further investigation.
References
 1.Barendregt, H.P.: The lambda calculus: its syntax and semantics. In: Studies in Logic and the Foundations of Mathematics, vol. 103, second, revised edn. NorthHolland Publishing Company (1984)Google Scholar
 2.Church, A.: The Calculi of lambdaConversion. No. 6 in Annals of Mathematics Studies. Princeton University Press, Princeton (1941)Google Scholar
 3.Curry, H.B., Feys, R.: Combinatory Logic, Volume I. NorthHolland (1958). (Second printing 1968)Google Scholar
 4.Girard, J.Y., Lafont, Y., Taylor, P.: Proofs and Types. In: Cambridge Tracts in Theoretical Computer Science, vol. 7. Cambridge University Press (1989)Google Scholar
 5.Hankin, C.: An Introduction to Lambda Calculi for Computer Scientists. In: Texts in Computing, vol. 2. King’s College Publications (2004)Google Scholar
 6.Klop, J.W.: Combinatory Reduction Systems. In: Mathematical Centre Tracts, vol. 127. Mathematischen Centrum, 413 Kruislaan, Amsterdam (1980)Google Scholar
 7.Mogensen, T.Æ.: An investigation of compact and efficient number representations in the pure lambda calculus. In: Bjørner, D., Broy, M., Zamulin A.V. (eds.) Perspectives of System Informatics, 4th International Andrei Ershov Memorial Conference, PSI 2001, Akademgorodok, Novosibirsk, Russia, July 2–6, 2001, Revised Papers, Lecture Notes in Computer Science, vol. 2244, pp. 205–213. Springer (2001)Google Scholar
 8.Parigot, M., Rozière, P.: Constant time reductions in lambdacalculus. In: Borzyszkowski, A.M., Sokolowski S. (eds.) Mathematical Foundations of Computer Science 1993, 18th International Symposium, MFCS’93, Gdansk, Poland, August 30  September 3, 1993, Proceedings, Lecture Notes in Computer Science, vol. 711, pp. 608–617. Springer (1993)Google Scholar
 9.Rezus, A.: Lambdaconversion and logic. Ph.D. thesis, University of Utrecht (1981)Google Scholar
 10.Wadsworth, C.P.: Some unusual \(\lambda \)calculus numeral systems. In: Seldin, J.P., Hindley, J.R. (eds.) To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, pp. 215–230. Academic Press, London (1980)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.