Keywords

1 Introduction

Computational complexity classes, and in particular those relating to polynomial time and space [11, 20] capture the concept of a feasible problem, and as such have been scrutinized with great care by the scientific community in the last fifty years. The fact that even apparently simple problems, such as nontrivial separation between those classes, remain open today has highlighted the need for a comprehensive study aimed at investigating the deep nature of computational complexity. The so-called implicit computational complexity [4, 8, 13, 30, 33] fits into this picture, and is concerned with characterizations of complexity classes based on tools from mathematical logic and the theory of programming languages.

One of the areas involved in this investigation is certainly that of term rewriting [34], which has proved useful as a tool for the characterization of complexity classes. In particular, the class \(\texttt{FP}\) (i.e., of polytime first-order functions) has been characterized through variations of techniques originally introduced for termination, e.g., the interpretation method [29, 31], path orders [15], or dependency pairs [16]. Some examples of such characterizations can be found in [1, 3, 7, 9, 10].

After the introduction of \(\texttt{FP}\), it became clear that the study of computational complexity also applies to higher-order functionals, which are functions that take not only data but also other functions as inputs. The pioneering work of Constable [12], Mehlhorn [32], and Kapron and Cook [22] laid the foundations of the so-called higher-order complexity, which remains a prolific research area to this day. Some motivations for this line of work can be found e.g. in computable analysis [24], NP search problems [6], and programming language theory [14].

There have been several proposals for a class of type-two functionals that correctly generalizes \(\texttt{FP}\). However, the most widely accepted one is the class \(\texttt{BFF}\) of basic feasible functionals. This class can be characterized based on function algebras, similar to Cobham-style, but it can also be described using Oracle Turing machines. The class \(\texttt{BFF}\) was then the object of study by the research community, which over the years has introduced a variety of characterizations, e.g., in terms of programming languages with restricted recursion schemes [14, 21], typed imperative languages [17, 18], and restricted forms of iteration in OTMs [23]. An investigation of higher-order complexity classes employing the higher-order interpretation method (in the context of a pure higher-order functional language) was also proposed in [19]. However, this paper does not provide a characterization of the standard \(\texttt{BFF}\) class. Instead, it characterizes a newly proposed class \( \texttt{SFF}_2 \) (Safe Feasible Functionals) which is defined as the restriction of \(\texttt{BFF}\) to argument functions in \(\texttt{FP}\) (see Sect. 4.2 and the conclusion in [19]).

The studies cited above present structurally complex programming languages and logical systems, precisely due to the presence of higher-order functions. It is not currently known whether it is possible to give a characterization of \(\texttt{BFF}\) in terms of mainstream concepts of rewriting theory, although the latter has long been known to provide tools for the modeling and analysis of functional programs with higher-order functions [25].

This paper goes precisely in that direction by showing that the interpretation method in the form studied by Kop and Vale [26, 27] provides the right tools to characterize \(\texttt{BFF}\). More precisely, we consider a class of higher-order rewriting systems admitting cost–size tuple interpretations (with some mild upper-bound conditions on their cost and size components) and show that this class contains exactly the functionals in \(\texttt{BFF}\). Such a characterization could not have been obtained employing classical integer interpretations as e.g. in [9] because \(\texttt{BFF}\) crucially relies on some conditions both on size and on time. This is the main contribution of our paper, formally stated in Theorem 2.

We believe that a benefit of this characterization is that it opens the way to effectively handling programs or executable specifications implementing \(\texttt{BFF}\) functions, in full generality. For instance, we expect that such a characterization could be integrated into rewriting-based tools for complexity analysis of term rewriting systems such as e.g. [2].

Our result is proved in two parts. We first prove that if any term rewriting system in this class computes a higher-order functional, then this functional has to be in \(\texttt{BFF}\) (soundness). Conversely, we prove that all functionals in \(\texttt{BFF}\) are computed by this class of rewriting systems (completeness). We argue that the key ingredient towards achieving this characterization is the ability to split the dual notions of cost and size given by the usage of tuple interpretations.

2 Preliminaries

2.1 Higher-Order Rewriting

We roughly follow the definition of simply-typed term rewriting system [28] (STRS): terms are applicative, and we limit our interest to second-order STRSs where all rules have base type. Reductions follow an innermost evaluation strategy.

Let \( \mathbb {B}\) be a nonempty set whose elements are called base types and range over \( \iota , \kappa , \nu \). The set \( {\mathbb {T}(\mathbb {B})}\) of simple types over \( \mathbb {B}\) is defined by the grammar \( {\mathbb {T}(\mathbb {B})}{:}{=}\mathbb {B}\mid {\mathbb {T}(\mathbb {B})}\Rightarrow {\mathbb {T}(\mathbb {B})}\). Types from \( {\mathbb {T}(\mathbb {B})}\) are ranged over by \( \sigma , \tau , \rho \). The \( \Rightarrow \) type constructor is right-associative, so we write \( \sigma \Rightarrow \tau \Rightarrow \rho \) for \( {( \sigma \Rightarrow ( \tau \Rightarrow \rho ) )} \). Hence, every type \( \sigma \) can be written as \( \sigma _1 \Rightarrow \cdots \Rightarrow \sigma _n \Rightarrow \iota \). We may write such types as \( \vec {\sigma } \Rightarrow \iota \). The order of a type is: \( \texttt{ord}(\iota ) = 0 \) for \( \iota \in \mathbb {B}\) and \( \texttt{ord}(\sigma \Rightarrow \tau ) = \max (1 + \texttt{ord}(\sigma ), \texttt{ord}(\tau )) \). A signature \( \mathbb {F}\) is a triple \( {(\mathbb {B}, \varSigma , \texttt{typeOf})} \) where \( \mathbb {B}\) is a set of base types, \( \varSigma \) is a nonempty set of symbols, and \( \texttt{typeOf}: \varSigma \longrightarrow {\mathbb {T}(\mathbb {B})}\). For each type \( \sigma \), we assume given a set \( \mathbb {X}_\sigma \) of countably many variables and assume that \( \mathbb {X}_\sigma \cap \mathbb {X}_\tau = \emptyset \) if \( \sigma \ne \tau \). We let \( \mathbb {X}\) denote \( \cup _\sigma \mathbb {X}_\sigma \) and assume that \( \varSigma \cap \mathbb {X}= \emptyset \).

The set \( \textsf{T}(\mathbb {F},\mathbb {X})\) — of terms built from \( \mathbb {F}\) and \( \mathbb {X}\) — collects those expressions \( s\) for which a judgment \( s\mathrel {:}\sigma \) can be deduced using the following rules:

figure a

As usual, application of terms is left-associative, so we write \( s\,t\,u\) for \( {((s\,t) \,u)} \). Let \(\texttt{vars}(s)\) be the set of variables occurring in \(s\). A term \( s\) is ground if \( \texttt{vars}(s) = \emptyset \). The head symbol of a term \( \textsf{f}\,s_1 \cdots s_n \) is \( \textsf{f}\). We say \( t\) is a subterm of \( s\) (written \( s\mathrel {\unrhd }t\)) if either (a) \( s= t\), or (b) \( s= s' \,s'' \) and \( s' \mathrel {\unrhd }t\) or \( s'' \mathrel {\unrhd }t\). It is a proper subterm of \( s\) if \( s\ne t\). For a term \( s \), \( \texttt{pos}(s) \) is the set of positions in \( s \): \( \texttt{pos}(x) = \texttt{pos}(\textsf{f}) = \{ \sharp \} \) and \( \texttt{pos}( s \,t) = \{ \sharp \} \cup \{ 1 \cdot u \mid u \in \texttt{pos}(s) \} \cup \{ 2 \cdot u \mid u \in \texttt{pos}(t) \} \). For \( p \in \texttt{pos}(s)\), the subterm \( s|_p \) at position \(p\) is given by: \( s|_\sharp = s \) and \( (s_1 \,s_2)|_{i \cdot p} = s_i|_p \).

In this paper, we require that for all \( \textsf{f}\in \varSigma \), \( \texttt{ord}(\texttt{typeOf}(\textsf{f})) \le 2 \), so w.l.o.g., \( \textsf{f}\mathrel {:}(\vec {\iota }_1 \Rightarrow \kappa _1) \Rightarrow \cdots \Rightarrow (\vec {\iota }_k \Rightarrow \kappa _k) \Rightarrow \nu _1 \Rightarrow \cdots \Rightarrow \nu _l \Rightarrow \iota \). Hence, in a fully applied term \( \textsf{f}\,s_1 \ldots s_k \,t_1 \ldots t_l \) we say the \( s_i \) are the arguments of type-1 and the \( t_j \) are the arguments of type-0 for \( \textsf{f}\). A substitution \( \gamma \) is a type-preserving map from variables to terms such that \({\{ x\in \mathbb {X}\mid \gamma (x) \ne x\}}\) is finite. We extend \( \gamma \) to terms as usual: \( x\gamma = \gamma (x) \), \( \textsf{f}\gamma = \textsf{f}\), and \( (s\,t) \gamma = (s\gamma ) \,(t\gamma ) \). A context \( C \) is a term with a single occurrence of a variable \( \Box \); the term \( C[s] \) is obtained by replacing \( \Box \) by \( s \).

A rewrite rule \( \ell \rightarrow r \) is a pair of terms of the same type such that \( \ell = \textsf{f}\,\ell _1 \cdots \ell _m \) and \( \texttt{vars}(\ell ) \supseteq \texttt{vars}(r) \). It is left-linear if no variable occurs more than once in \( \ell \). A simply-typed term rewriting system \( {(\mathbb {F},\mathcal {R})}\) is a set of rewrite rules \( \mathcal {R}\) over \( \textsf{T}(\mathbb {F},\mathbb {X})\). In this paper, we require that all rules have base type. An STRS is innermost orthogonal if all rules are left-linear, and for any two distinct rules \( \ell _1 \rightarrow r_1,\ell _2 \rightarrow r_2 \), there are no substitutions \( \gamma ,\delta \) such that \( \ell _1\gamma = \ell _2\delta \). A reducible expression (redex) is a term of the form \( \ell \gamma \) for a rule \(\ell \rightarrow r\) and substitution \( \gamma \). The innermost rewrite relation induced by \( \mathcal {R}\) is defined as follows:

  • \( \ell \gamma \rightarrow _\mathcal {R}r\gamma \), if \( \ell \rightarrow r \in \mathcal {R}\) and \( \ell \gamma \) has no proper subterm that is a redex;

  • \( s\,t\rightarrow _\mathcal {R}u\,t\), if \( s\rightarrow _\mathcal {R}u\) and \( s\,t\rightarrow _\mathcal {R}s\,u\), if \( t\rightarrow _\mathcal {R}u\).

We write \( \mathrel {\rightarrow _\mathcal {R}^{+}}\) for the transitive closure of \( \rightarrow _\mathcal {R}\). An STRS \( \mathcal {R}\) is innermost terminating if no infinite rewrite sequence \( s\rightarrow _\mathcal {R}t\rightarrow _\mathcal {R}\ldots \) exists. It is innermost confluent if \( s\mathrel {\rightarrow _\mathcal {R}^{+}}t\) and \( s\mathrel {\rightarrow _\mathcal {R}^{+}}u\) implies that some \( v \) exists with \( t\mathrel {\rightarrow _\mathcal {R}^{+}}v \) and \( u\mathrel {\rightarrow _\mathcal {R}^{+}}v \). It is well-known that innermost orthogonality implies innermost confluence. In this paper, we will typically drop the “innermost” adjective and simply refer to terminating/orthogonal/confluent STRSs.

Example 1

Let \( \mathbb {B}= \{ \textsf{nat}\} \) and \( \textsf{0}\mathrel {:}\textsf{nat}, \textsf{s}\mathrel {:}\textsf{nat}\Rightarrow \textsf{nat}, \textsf{add},\textsf{mult}\mathrel {:}\textsf{nat}\Rightarrow \textsf{nat}\Rightarrow \textsf{nat}\), and \( \textsf{funcProd}\mathrel {:}(\textsf{nat}\Rightarrow \textsf{nat}) \Rightarrow \textsf{nat}\Rightarrow \textsf{nat}\Rightarrow \textsf{nat}\). We then let \( \mathcal {R}\) be given by:

$$ \begin{array}{rclcrcl} \textsf{add}\,\textsf{0}\,y &{} \rightarrow &{} y &{} \quad &{} \textsf{add}\,(\textsf{s}\,x) \,y &{} \rightarrow &{} \textsf{s}\,(\textsf{add}\,x \,y) \\ \textsf{mult}\,\textsf{0}\,y &{} \rightarrow &{} \textsf{0}&{} \quad &{} \textsf{mult}\,(\textsf{s}\,x) \,y &{} \rightarrow &{} \textsf{add}\,y \,(\textsf{mult}\,x \,y) \\ \textsf{funcProd}\,F \,\textsf{0}\,y &{} \rightarrow &{} y &{} \quad &{} \textsf{funcProd}\,F \,(\textsf{s}\,x) \,y &{} \rightarrow &{} \textsf{funcProd}\,F \,x \,(\textsf{mult}\,y \,(F \,x)) \\ \end{array} $$

Hereafter, we write \( \ulcorner \textsf{n}\urcorner \) for the term \( \textsf{s}\,(\textsf{s}\,(\dots \,\textsf{0}\dots )) \) with \( n \) \( \textsf{s}\)s.

2.2 Cost–Size Interpretations

For sets \( A \) and \( B \), we write \( A \longrightarrow B \) for the set of functions from \( A \) to \( B \). A quasi-ordered set \( (A,\sqsupseteq ) \) consists of a nonempty set \( A \) and a reflexive and transitive relation \(\sqsupseteq \) on \( A \). For quasi-ordered sets \( (A_1,\sqsupseteq _1) \) and \( (A_2,\sqsupseteq _2) \), we write \( A_1 \Longrightarrow A_2 \) for the set of functions \( f \in A_1 \longrightarrow A_2 \) such that \( f(x) \sqsupseteq _2 f(y) \) whenever \( x \sqsupseteq _1 y \), i.e., \(A_1 \Longrightarrow A_2 \) is the space of functions that preserve quasi-ordering.

For every \( \iota \in \mathbb {B}\), let a quasi-ordered set \( (\mathcal {S}_{\iota },\sqsupseteq _\iota ) \) be given. We extend this to \( {\mathbb {T}(\mathbb {B})}\) by defining \( \mathcal {S}_{\sigma \Rightarrow \tau } = (\mathcal {S}_{\sigma } \Longrightarrow \mathcal {S}_{\tau }, \sqsupseteq _{\sigma \Rightarrow \tau })\) where \( f \sqsupseteq _{\sigma \Rightarrow \tau } g \) iff \( f(x) \sqsupseteq _\tau f(x) \) for any \( x \in \mathcal {S}_{\sigma } \). Given a function \( \mathcal {J}_{}^\textsf{s} \) mapping \( \textsf{f}\in \varSigma \) to some \( \mathcal {J}_{\textsf{f}}^\textsf{s} \in \mathcal {S}_{\texttt{typeOf}(\textsf{f})} \) and a valuation \( \alpha \) mapping \( x \in \mathbb {X}_\sigma \) to \( \mathcal {S}_{\sigma } \), we can map each term \( s \mathrel {:}\sigma \) to an element of \( \mathcal {S}_{\sigma } \) naturally as follows: (a) \( {\llbracket {} x \rrbracket }^\textsf{s}_\alpha = \alpha (x) \); (b) \( {\llbracket {} \textsf{f} \rrbracket }^\textsf{s}_\alpha = \mathcal {J}_{\textsf{f}}^\textsf{s} \); (c) \( {\llbracket {} s \,t \rrbracket }^\textsf{s}_\alpha = {\llbracket {} s \rrbracket }^\textsf{s}_\alpha ({\llbracket {} t \rrbracket }^\textsf{s}_\alpha ) \).

For every type \( \sigma \) with \( \texttt{ord}(\sigma ) \le 2 \), we define \( \mathcal {C}_{\sigma } \) as follows: (a) \( \mathcal {C}_{\kappa } = \mathbb {N}\) for \( \kappa \in \mathbb {B}\); (b) \( \mathcal {C}_{\iota \Rightarrow \tau } = \mathcal {S}_{\iota } \Longrightarrow \mathcal {C}_{\tau } \) for \( \iota \in \mathbb {B}\); and (c) \( \mathcal {C}_{\sigma \Rightarrow \tau } = \mathcal {C}_{\sigma } \Longrightarrow \mathcal {S}_{\sigma } \Longrightarrow \mathcal {C}_{\tau } \) if \( \texttt{ord}(\sigma ) = 1 \). We want to interpret terms \( s\mathrel {:}\sigma \) where both \( \sigma \) and all variables occurring in \( s\) are of type order either \( 0 \) or \( 1 \), as is the case for the left- and right-hand side of rules. Thus, we let \( \mathcal {J}_{}^\textsf{c} \) be a function mapping \( \textsf{f}\in \varSigma \) to some \( \mathcal {J}_{\textsf{f}}^\textsf{c} \in \mathcal {C}_{\texttt{typeOf}(\textsf{f})} \) and assume given, for each type \( \sigma \), valuations \( \alpha : \mathbb {X}_\sigma \longrightarrow \mathcal {S}_{\sigma } \) and \( \zeta : \mathbb {X}_\sigma \longrightarrow \mathcal {C}_{\sigma } \). We then define:

$$ \begin{array}{rcl} {\llbracket {} x\,s_1 \cdots s_n \rrbracket }^\textsf{c}_{\alpha ,\zeta } &{} = &{} \zeta (x)({\llbracket {} s_1 \rrbracket }^\textsf{s}_\alpha ,\dots ,{\llbracket {} s_n \rrbracket }^\textsf{s}_\alpha ) \\ {\llbracket {} \textsf{f}\,s_1 \cdots s_k \,t_1 \cdots t_n \rrbracket }^\textsf{c}_{\alpha ,\zeta } &{} = &{} \mathcal {J}_{\textsf{f}}^\textsf{c}( {\llbracket {} s_1 \rrbracket }^\textsf{c}_{\alpha ,\zeta },{\llbracket {} s_1 \rrbracket }^\textsf{s}_\alpha , \ldots , {\llbracket {} s_k \rrbracket }^\textsf{c}_{\alpha ,\zeta },{\llbracket {} s_k \rrbracket }^\textsf{s}_\alpha , {\llbracket {} t_1 \rrbracket }^\textsf{s}_\alpha ,\ldots , {\llbracket {} t_n \rrbracket }^\textsf{s}_\alpha ) \end{array} $$

We let \( \textsf{cost}(s)_{\alpha , \zeta } = \sum \{ {\llbracket {} t \rrbracket }^\textsf{c}_{\alpha ,\zeta } \mid s \mathrel {\unrhd }t \text { and t is a non-variable term of base type}\} \). This is all well-defined under our assumptions that all variables have a type of order \( 0 \) or \( 1 \), and \( \textsf{f}\mathrel {:}(\vec {\iota _1} \Rightarrow \kappa _1) \Rightarrow \cdots \Rightarrow (\vec {\iota _k} \Rightarrow \kappa _k) \Rightarrow \nu _1 \Rightarrow \cdots \Rightarrow \nu _l \Rightarrow \iota \). We also define \( \mathsf {cost'}(s)_{\alpha , \zeta } = \sum \{ {\llbracket {} t \rrbracket }^\textsf{c}_{\alpha ,\zeta } \mid s \mathrel {\unrhd }t \text { and } t \notin \mathbb {X}\text { is of base type not in normal form} \} \).

A cost–size interpretation \( \mathcal {F}\) for a second order signature \( \mathbb {F}= (\mathbb {B}, \varSigma , \texttt{typeOf}) \) is a choice of a quasi-ordered set \( \mathcal {S}_{\iota } \), for each \( \iota \in \mathbb {B}\), along with cost- and size-interpretations \( \mathcal {J}_{}^\textsf{c} \) and \( \mathcal {J}_{}^\textsf{s} \) defined as above. Let \( {(\mathbb {F},\mathcal {R})}\) be an STRS over \( \mathbb {F}\). We say \( {(\mathbb {F},\mathcal {R})}\) is compatible with a cost–size interpretation if for any valuations \( \alpha \) and \( \zeta \), we have (a) \( {\llbracket {} \ell \rrbracket }^\textsf{c}_{\alpha , \zeta } > \textsf{cost}(r)_{\alpha ,\zeta } \) and (b) \( {\llbracket {} \ell \rrbracket }^\textsf{s}_{\alpha } \sqsupseteq {\llbracket {} r \rrbracket }^\textsf{s}_{\alpha } \), for all rules \( \ell \rightarrow r \) in \( \mathcal {R}\). In this case we say such cost–size interpretation orients all rules in \( \mathcal {R}\).

Theorem 1 (Innermost Compatibility)

[Innermost Compatibility] Suppose \( \mathcal {R}\) is an STRS compatible with a cost–size interpretation \( \mathcal {F}\), then for any valuations \( \alpha \) and \( \zeta \) we have \( \mathsf {cost'}(s)_{\alpha ,\zeta } > \mathsf {cost'}(t)_{\alpha ,\zeta } \) and \( {\llbracket {} s \rrbracket }^\textsf{s}_\alpha \sqsupseteq {\llbracket {} t \rrbracket }^\textsf{s}_\alpha \) whenever \( s \rightarrow _\mathcal {R}t \).

From compatibility, we have that if \( s_0 \rightarrow _\mathcal {R}\cdots \rightarrow _\mathcal {R}s_n \), then \( n \le \mathsf {cost'}(s_0) \). Hence, \( \mathsf {cost'}(s) \) bounds the derivation height of \( s \). This follows from [26, Corollary 34], although we significantly simplified the presentation: the limitation to second-order fully applied rules and the lack of abstraction terms allow us to avoid many of the complexities in [26]. We also adapted it to innermost rather than call-by-value evaluation. A correctness proof of this version is supplied in [5]. Since \( \alpha \) and \( \zeta \) are universally quantified, we typically omit them, and just write \( x \) instead of \( \alpha (x) \) and \( F^c \) instead of \( \zeta (F) \).

Example 2

We let \( \mathcal {S}_{\textsf{nat}} = (\mathbb {N},\ge ) \) and assign \( \mathcal {J}_{\textsf{0}}^\textsf{s} = 0 \) and \( \mathcal {J}_{\textsf{s}}^\textsf{s} = \lambda \!\!\!\lambda x.x+1 \), as well as \( \mathcal {J}_{\textsf{0}}^\textsf{c} = 0 \) and \( \mathcal {J}_{\textsf{s}}^\textsf{c} = \lambda \!\!\!\lambda x.0 \). This gives us \( {\llbracket {} \ulcorner \textsf{n}\urcorner \rrbracket }^\textsf{s} = n \) for all \( n \in \mathbb {N}\), and \( {\llbracket {} \ulcorner \textsf{n}\urcorner \rrbracket }^\textsf{c} = \textsf{cost}(n) = 0 \). Now, we let \( \mathcal {J}_{\textsf{add}}^\textsf{s} = \lambda \!\!\!\lambda xy.x+y \) and \( \mathcal {J}_{\textsf{mult}}^\textsf{s} = \lambda \!\!\!\lambda xy.x*y \); then indeed \( {\llbracket {} \ell \rrbracket }^\textsf{s} \ge {\llbracket {} r \rrbracket }^\textsf{s} \) for the first four rules of Example 1 (e.g., \( {\llbracket {} \textsf{mult}\,(\textsf{s}\,x) \,y \rrbracket }^\textsf{s} = (x+1) * y \ge y + (x * y) = {\llbracket {} \textsf{add}\,y \,(\textsf{mult}\,x \,y) \rrbracket }^\textsf{s} \)). Moreover, let us choose \( \mathcal {J}_{\textsf{add}}^\textsf{c} = \lambda \!\!\!\lambda x y.x + 1 \) and \( \mathcal {J}_{\textsf{mult}}^\textsf{c} = \lambda \!\!\!\lambda x y.x * y + x + 1 \). Then also \( {\llbracket {} \ell \rrbracket }^\textsf{c} > \textsf{cost}(r) \) for all rules; for example, \( {\llbracket {} \textsf{mult}\,(\textsf{s}\,x) \,y \rrbracket }^\textsf{c} = (x + 1) * y + 2 * x + 3 > (y + 1) + (x * y + 2 * x + 1) = {\llbracket {} \textsf{add}\,y \,(\textsf{mult}\,x \,y) \rrbracket }^\textsf{c} + {\llbracket {} \textsf{mult}\,x \,y \rrbracket }^\textsf{c} = \textsf{cost}(\textsf{add}\,y \,(\textsf{mult}\,x \,y)) \). Regarding \( \textsf{funcProd}\), we can orient both rules by choosing \( \mathcal {J}_{\textsf{funcProd}}^\textsf{s} = \lambda \!\!\!\lambda F x y.y * {\max (F(x),1)}^x \) and \( \mathcal {J}_{\textsf{funcProd}}^\textsf{c} = \lambda \!\!\!\lambda F G x y.2 * x * y * {\max (F(x),1)}^{x+1} + x * G(x) + 2 * x + 1 \). This works due to the monotonicity assumption, which provides, e.g., \( G(x+1) \ge G(x) \). (This function is not polynomial, but that is allowed in the general case.)

2.3 Basic Feasible Functionals

We assume familiarity with Turing machines. In this paper, we consider deterministic multi-tape Turing machines. Those are, conceptually, machines consisting of a finite set of states, one or more (but a fixed number of) right-infinite tapes divided into cells. Each tape is equipped with a tape head that scans the symbols on the tape’s cells and may write on it. The head can move to the left or right. Let \( W= {\{0,1\}}^* \). A k-ary Oracle Turing Machine (OTM) is a deterministic multi-tape Turing machine with at least \( 2k + 1 \) tapes: one main tape for (input/output), \( k \) designated query tapes, and \( k \) designated answer tapes. It also has \( k \) distinct query states \( q_i \) and \( k \) answer states \( a_i \).

A computation with a k-ary OTM \( M\) requires \( k \) fixed oracle functions \( f_1, \ldots , f_k : W\longrightarrow W\). We write \( M_{\vec {f}} \) to denote a run of \( M\) with these functions. A run of \( M_{\vec {f}} \) on \( w \) starts with \( w \) written in the main tape. It ends when the machine halts, and yields the word that is written in the main tape as output. As usual, we only consider machines that halt on all inputs. The computation proceeds as usual for non-query states. To query the value of \( f_i \) on \( w \), the machine writes \( w \) on the corresponding query tape and enters the query state \( q_i \). Then, in one step, the machine transitions to the answer state \( a_i \) as follows: (a) the query value \( w \) written in the query tape for \( f_i \) is read; (b) the contents of the answer tape for \(f_i \) are changed to \( f_i(w) \); (c) the query value \( w \) is erased from the query tape; and (d) the head of the answer tape is moved to its first symbol. The running time of \( M_{\vec {f}} \) on \( w \) is the number of steps used in the computation.

A type-1 function is a mapping in \( W\longrightarrow W\). A type-2 functional of rank \( (k,l) \) is a mapping in \( {(W\longrightarrow W)}^k \longrightarrow W^l \longrightarrow W\).

Definition 1

We say an OTM \( M\) computes a type-2 functional \( \varPsi \) of rank \( (k,l) \) iff for all type-1 functions \( f_1, \dots , f_k\) and \( x_1,\dots ,x_l \in W\), whenever \( M_{f_1,\dots ,f_k} \) is started with \( x_1, \dots ,x_l \) written on its main tape (separated by blanks), it halts with \( \varPsi (f_1,\dots ,f_k,x_1,\dots ,x_l) \) written on its main tape.

Definition 2

Let \( {\{ F_1, \dots , F_k \}} \) be a set of type-1 variables and \( {\{ x_1, \dots , x_l \}} \) a set of type-0 variables. The set \( {\texttt{Pol}^2_\mathbb {N}[F_1, \dots , F_k; x_1, \dots , x_l]} \) of second-order polynomials over \( \mathbb {N}\) with indeterminates \( F_1, \dots , F_k, x_1, \dots , x_l \) is generated by:

$$\begin{aligned} P, Q {:}{=}n \mid x \mid P + Q \mid P * Q \mid F(Q) \end{aligned}$$

where \( n \in \mathbb {N}\), \( x \in \{ x_1 , \dots , x_l \} \), and \( F \in \{ F_1, \dots , F_k \} \).

Notice that a polynomial expression can be viewed as a type-2 functional in the natural way, e.g., \( P(F,x) = 3 * F(x) + x \) is a second-order polynomial functional. Given \( w \in W \), we write \( |w| \) for its length and define the length \( |f| \) of \( f : W \longrightarrow W \) as \( |f| = \lambda \!\!\!\lambda n. \max \limits _{{|y|} \le n} |f(y)| \). This allows us to define \(\texttt{BFF}\) as the class of functionals computable by OTMs with running time bounded by a second-order polynomial.

Definition 3

A type-2 functional \( \varPsi \) is in \(\texttt{BFF}\) iff there exist an OTM \( M\) and a second-order polynomial \( P \) such that \( M \) computes \( \varPsi \) and for all \( \vec {f} \) and \( \vec {x} \): the running time of \( M_{f_1,\dots , f_k} \) on \( x_1,\dots ,x_l \) is at most \( P(|f_1|,\dots , |f_k|,|x_1|,\dots ,|x_l|) \).

3 Statement of the Main Result

The main result of this paper roughly states that \(\texttt{BFF}\) consists exactly of those type-2 functionals computed by an STRS compatible with a polynomially bounded cost–size tuple interpretation. To formally state this result, we must first define what it means for an STRS to compute a type-2 functional and define precisely the class of cost–size interpretations we are interested in.

Indeed, let us start by encoding words in \( W\) as terms. We let \( \textsf{bit},\textsf{word}\in \mathbb {B}\) and introduce symbols \( \textsf{o},\textsf{i}\mathrel {:}\textsf{bit}\) and \( \mathsf {[]}\mathrel {:}\textsf{word},\ \mathsf {::}\mathrel {:}\textsf{bit}\Rightarrow \textsf{word}\Rightarrow \textsf{word}\). Then for instance \( 001 \) is encoded as the term \( \mathsf {::}\,\textsf{o}\,(\mathsf {::}\,\textsf{o}\,(\mathsf {::}\,\textsf{i}\,\mathsf {[]})) \). We use the cleaner list-like notation \( [\textsf{o}; \textsf{o}; \textsf{i}] \) in practice. Let \( \underline{\textsf{w}} \) denote the term encoding of a word \( w \). Next, we encode type-1 functions as a possibly infinite set of one-step rewrite rules.

Definition 4

Consider a type-1 function \( f : W\longrightarrow W\) and let \( \textsf{S}_f \mathrel {:}\textsf{word}\Rightarrow \textsf{word}\) be a fresh function symbol. A set of rules \( \mathcal {R}_f \) defines \( f \) by way of \( \textsf{S}_f \) if for each \( w \in W\) there is exactly one rule of the form \( \textsf{S}_f \,\underline{\textsf{w}} \rightarrow \underline{\mathsf {f(w)}} \) in \( \mathcal {R}_f \).

Henceforth, we assume given that our STRS \( {(\mathbb {F},\mathcal {R})}\) at hand is such that \( \mathbb {F}\) contains \( \textsf{o}, \textsf{i}, \mathsf {[]}, \mathsf {::}\) typed as above and a distinguished symbol \( \textsf{F}\mathrel {:}{(\textsf{word}\Rightarrow \textsf{word})}^k \Rightarrow \textsf{word}^l \Rightarrow \textsf{word}\). Given type-1 functions \( f_1, \dots , f_k \), we write \( \mathbb {F}_{\vec {f}} \) for \( \mathbb {F}\) extended with function symbols \( \textsf{S}_{f_i} \mathrel {:}\textsf{word}\Rightarrow \textsf{word}\), with \( 1 \le i \le k \), and let \( \mathcal {R}_{+\vec {f}}= \mathcal {R}\cup \bigcup _{i=1}^k \mathcal {R}_f \). Now we can define the notion of type-2 computability for such STRSs.

Definition 5

Let \( {(\mathbb {F},\mathcal {R})}\) be an STRS. We say that \( \textsf{F}\) computes the type-2 functional \( \varPsi \) in \( (\mathbb {F},\mathcal {R}) \) iff for all type-1 functions \( f_1, \dots , f_k \) and all \( w_1,\dots ,w_l \in W\), \( \textsf{F}\,\textsf{S}_{f_1} \cdots \textsf{S}_{f_k} \,\underline{\mathsf {w_1}} \cdots \underline{\mathsf {w_l}} \rightarrow _{\mathcal {R}_{+\vec {f}}}^+ \underline{\textsf{u}} \), where \( u = \varPsi (f_1,\dots ,f_k,w_1,\dots ,w_l) \).

Next, we define what we mean by polynomially bounded interpretation.

Definition 6

We say an STRS \( {(\mathbb {F},\mathcal {R})}\) admits a polynomially bounded interpretation iff \( {(\mathbb {F},\mathcal {R})}\) is compatible with a cost–size interpretation such that:

  • \( \mathcal {S}_{\textsf{word}} = (\mathbb {N},\ge ) \);

  • \( \mathcal {J}_{\textsf{o}}^\textsf{c} = \mathcal {J}_{\textsf{i}}^\textsf{c} = \mathcal {J}_{\mathsf {[]}}^\textsf{c} = 0 \), \( \mathcal {J}_{\mathsf {::}}^\textsf{c} = \lambda \!\!\!\lambda xy.0\), and \( \mathcal {J}_{\mathsf {::}}^\textsf{s} = \lambda \!\!\!\lambda xy.x + y +c \) for some \( c \ge 1 \);

  • \( \mathcal {J}_{\textsf{F}}^\textsf{c} \) is bounded by a polynomial in \( {\texttt{Pol}^2_\mathbb {N}[F_1^\textsf{c},F_1^\textsf{s},\ldots ,F_k^\textsf{c},F_k^\textsf{s};x_1, \ldots ,x_l]} \).

Finally, we can formally state our main result.

Theorem 2

A type-2 functional \( \varPsi \) is in \(\texttt{BFF}\) if and only if there exists a finite orthogonal STRS \( {(\mathbb {F},\mathcal {R})}\) such that the distinguished symbol \( \textsf{F}\) computes \( \varPsi \) in \( {(\mathbb {F},\mathcal {R})}\) and \( \mathcal {R}\) admits a polynomially bounded cost–size interpretation.

We prove this result in two parts. First, we prove soundness in Section 4 which states that every type-2 functional computed by an STRS as above is in \(\texttt{BFF}\). Then in Section 5 we prove completeness which states that every functional in \(\texttt{BFF}\) can be computed by such an STRS. In order to simplify proofs, we only consider type-2 functions of rank (1,1). We claim that the results can be easily generalized, but the proofs become more tedious when handling multiple arguments.

Example 3

Let us consider the type-2 functional defined by \( \varPsi {:}{=}\lambda \!\!\!\lambda f x. \sum \limits _{i < |x|} f(i) \). Notice that \( \varPsi \) adds all \( f(i) \) over each word \( i \in W\) whose value (as a natural number) is smaller than the length of \( x \). This functional was proved to lie in \(\texttt{BFF}\) in [21], where the authors utilized an encoding of \( \varPsi \) as a \( \textsf{BTLP}_2 \) program. We can encode \( \varPsi \) as an STRS as follows. Let us consider ancillary symbols \( \textsf{lengthOf}\mathrel {:}\textsf{word}\Rightarrow \textsf{nat}\) and \( \textsf{toBin}\mathrel {:}\textsf{nat}\Rightarrow \textsf{word}\). The former computes the length of a given word and the latter converts a number from unary to binary representation. We also consider rules for addition on binary words, i.e., \( \mathbin {\mathsf {+_{B}}}\mathrel {:}\textsf{word}\Rightarrow \textsf{word}\Rightarrow \textsf{word}\), which we use in infix notation below.

$$\begin{aligned} \textsf{compute}\,F\,x\,\textsf{0}\,acc&\rightarrow acc\\ \textsf{compute}\,F\,x\,(\textsf{s}\,i) \,acc&\rightarrow \textsf{compute}\,F\,x\,i\,(acc\,\mathbin {\mathsf {+_{B}}}\,{F(\textsf{toBin}\,i)} )\\ \textsf{start}\,F\,x&\rightarrow \textsf{compute}\,F\,x\,(\textsf{lengthOf}\,x) \,\mathsf {[]}\end{aligned}$$

Now, if we want to compute \( \varPsi (f,x) \) we simply reduce the term \( \textsf{start}\,\textsf{S}_f \,\underline{\textsf{x}} \) to normal form. To show that this system is in \( \texttt{BFF}{} \) via our rewriting formalism, we need to exhibit a cost–size tuple interpretation for it that satisfies Definition 6, see [5, Example 3].

4 Soundness

In order to prove soundness, let us consider a fixed finite orthogonal STRS \( \mathcal {R}\) admitting a polynomially bounded cost–size interpretation such that it computes a type-2 functional \( \varPsi \). We proceed to show that \( \varPsi \) is in \(\texttt{BFF}\) roughly as follows:

  1. 1.

    Since \( \mathcal {R}\) computes \( \varPsi \) and admits a polynomially bounded interpretation, we show that so does the extended system \( \mathcal {R}_{+f}\) (Definition 5). The restriction on \( \mathcal {J}_{\mathsf {::}}^\textsf{s} \) (Definition 6) implies that \( {\llbracket {} \textsf{F}\,\textsf{S}_f \,\underline{\textsf{w}} \rrbracket }^\textsf{c} \) is bounded by a second-order polynomial over \( |f|, |w| \). We show this in Lemma 1. By compatibility (Theorem 1), we can do at most polynomially many steps when reducing \( \textsf{F}\,\textsf{S}_f \,\underline{\textsf{w}} \).

  2. 2.

    The cost polynomial restricts the size of any input that the function variable \( F \) is applied to (e.g., a cost bound of \( 3 + F^\textsf{c}(m) \) implies that \( F \) is never called on a term with size interpretation \( > m \)). This is the subject of Lemma 3.

  3. 3.

    Using the observations above, we then show that by graph rewriting we can simulate \( \mathcal {R}_{+f}\) and compute each \( \mathcal {R}_{+f}\)-reduction step in polynomial time on an OTM. This guarantees that \( \varPsi \) is in \(\texttt{BFF}\), Theorem 3.

4.1 Interpreting The Extended STRS, Polynomially

Our first goal is to provide a polynomially bounded cost–size interpretation to the extended system \( \mathcal {R}_{+f}\). We start with the observation that the size interpretation of words in \( W \) is proportional to their length. Indeed, since \( \mathcal {J}_{\mathsf {::}}^\textsf{s} = \lambda \!\!\!\lambda xy.x+y+c \) (Definition 6) let \( \mu := \max (\mathcal {J}_{\textsf{o}}^\textsf{s},\mathcal {J}_{\textsf{i}}^\textsf{s}) + c \) and \( \nu := \mathcal {J}_{\mathsf {[]}}^\textsf{s} \). Consequently, for all \( w \in W\):

$$\begin{aligned} |w| \le {\llbracket {} \underline{\textsf{w}} \rrbracket }^\textsf{s} \le \mu * |w| + \nu \end{aligned}$$
(1)

Recall that by Definition 4 the extended system \( \mathcal {R}_{+f}\) has possibly infinitely many rules of the form \( \textsf{S}_f \underline{\textsf{w}} \rightarrow \underline{\mathsf {f(w)}} \). Such rules \( \textsf{S}_f \) represent calls for an oracle to compute \( f \) in a single step. Thus, we set their cost to 1. The size should be given by the length of the oracle output, taking the overhead of interpretation into account. Hence, we obtain:

$$ \mathcal {J}_{\textsf{S}_f}^\textsf{c} = \lambda \!\!\!\lambda x. 1 \quad \quad \mathcal {J}_{\textsf{S}_f}^\textsf{s} = \lambda \!\!\!\lambda x. \mu * |f|(x) + \nu $$

This is weakly monotonic because \( |f| \) is. It orients the rules in \( \mathcal {R}_f \) because \( {\llbracket {} \textsf{S}_f\,\underline{\textsf{w}} \rrbracket }^\textsf{c} = 1 > 0 = \textsf{cost}(\underline{\mathsf {f(w)}}) \), and \( {\llbracket {} \textsf{S}_f\,\underline{\textsf{w}} \rrbracket }^\textsf{s} = \mu * |f|({\llbracket {} \underline{\textsf{ w}} \rrbracket }^\textsf{s}) + \nu \ge \mu * |f|(|w|) + \nu \ge \mu * |f(w)| + \nu \) by definition of \( |f| \), which is superior or equal to \( {\llbracket {} \underline{\mathsf {f(w)}} \rrbracket }^\textsf{s} \).

As \( \mathcal {J}_{\textsf{F}}^\textsf{c} \) is bounded by a second-order polynomial \( \lambda \!\!\!\lambda F^c F^s x.P \), we can let \( D(F,n) := P(\lambda \!\!\!\lambda x.1,\lambda \!\!\!\lambda x.\mu *F(x)+\nu ,\mu *n+\nu ) \). Then \( D \) is a second-order polynomial, and \( D(|f|,|w|) \ge \mathcal {J}_{\textsf{F}}^\textsf{c}( \mathcal {J}_{\textsf{S}_f}^\textsf{c},\mathcal {J}_{\textsf{S}_f}^\textsf{s},{\llbracket {} \underline{\textsf{w}} \rrbracket }^\textsf{s}) = \textsf{cost}(\textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}}) \). By Theorem 1 we see:

Lemma 1

There exists a second-order polynomial \( D \) so that \( D(|f|,|w|) \) bounds the derivation height of \( \textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}} \) for any \(f \in W\longrightarrow W\) and \( w \in W\).

Notice that this lemma does not imply that \( \varPsi \) is in \(\texttt{BFF}\). It only guarantees that there is a polynomial bound to the number of rewriting steps for such systems. However, it does not immediately follow that this number is a reasonable bound for the actual computational cost of simulating a reduction on an OTM. Consider for example a rule \( \textsf{f}\,(\textsf{s}\,n) \,t \rightarrow \textsf{f}\,n \,(\textsf{c} \,t \,t) \). Every step doubles the size of the term. A naive implementation – which copies the duplicated term in each step – would take exponential time. Moreover, a single step using the oracle can create a very large output, which is not considered part of the cost of the reduction, even though an OTM would be unable to use it without first fully reading it.

Therefore, in order to prove soundness, we show how to realize a reasonable implementation of rewriting w.r.t. OTMs. In essence, we will show that (1) oracle calls are not problematic in the presence of polynomially bounded interpretations, and (2) we can handle duplication with an appropriate representation of rewriting.

4.2 Bounding The Oracle Input

We first deal with the reasonability of oracle calls. We will show that there exists a second-order polynomial \( B \) such that if an oracle call \( \textsf{S}_f\,\underline{\textsf{x}} \) occurs anywhere along the reduction \( \textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}} \mathrel {\rightarrow _\mathcal {R}^{+}}\underline{\textsf{v}} \), then \( |x| \le B(|f|,|w|) \). From this, we know that the growth of the overall term size during an oracle call is at most \( |f|(B(|f|, |w|)) \).

Let \( P \) again be the polynomial bounding \( \mathcal {J}_{\textsf{F}}^\textsf{c} \). Since \( P \) is a second-order polynomial, each occurrence of a sub-expression \( F^c(E) \) in \( P \) is a second-order polynomial, and so is \( E \). Let us enumerate these arguments as \( E_1,\dots ,E_n \). We can then form the new polynomial \( Q \) defined as

$$ Q {:}{=}\sum \limits _i E_i \quad \text {where occurrences of}\ F^c(E_j')\ \text {inside}\ E_i\ \text {are replaced by}\ 1 $$

We let \( B(G,y) := Q(\lambda \!\!\!\lambda z.\mu * G(z) + \nu ,\mu * y + \nu ) \).

Example 4

If \( P = \lambda \!\!\!\lambda F^c F^s x.x * F^\textsf{c}(3+F^\textsf{s}(9*x)) + F^\textsf{c}(12) * F^\textsf{c}(3+x*F^\textsf{c}(2))+5 \), then \( Q = 3 + F^\textsf{s}(9*x) + 12 + 3+x*1 + 2 = 20 + F^\textsf{s}(9*x) + x \). We have \( B(G,x) = 20 + \mu * G(9 * (\mu * x + \nu )) + \nu + (\mu * x + \nu ) = 20 + 2 * \nu + G(9*\mu *x + 9*\nu ) + \mu * x \).

Now \( B \) gives an upper bound to the argument values for \( F^\textsf{c}\) that are considered: if a function differs from \( \mathcal {J}_{\textsf{S}_f}^\textsf{c} \) only on argument values greater than \( B(|f|,|w|) \), then we can use it in \( P \) and obtain the same result. Formally:

Lemma 2

Fix \( f,w \). Let \( G \in \mathbb {N}\longrightarrow \mathbb {N}\) with \( G(z) = 1 \) if \( z \le B(|f|,|w|) \). Then \( P(G,\mathcal {J}_{\textsf{S}_f}^\textsf{s},{\llbracket {} \underline{\textsf{w}} \rrbracket }^\textsf{s}) = P(\mathcal {J}_{\textsf{S}_f}^\textsf{c},\mathcal {J}_{\textsf{S}_f}^\textsf{s},{\llbracket {} \underline{\textsf{w}} \rrbracket }^\textsf{s}) \).

This is proved by induction on the form of \( P \), using that \( G \) is never applied on arguments larger than \( B(|f|,|w|) \). Lemma 2 is used in the following key result:

Lemma 3 (Oracle Subterm Lemma)

[Oracle Subterm Lemma] Let \( f : W\longrightarrow W\) be a type-1 function and \( w \in W\). If \( \textsf{F}\,\textsf{S}_f \,\underline{\textsf{w}} \rightarrow _{\mathcal {R}_{+f}}^* C[\textsf{S}_f\,\underline{\textsf{x}}] \) for some context \( C \), then \( |x| \le B(|f|,|w|) \).

Proof

In view of a contradiction, suppose there exist \( f, w \), and \( x \) such that \( \textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}} \rightarrow _{\mathcal {R}_{+f}}^* C[\textsf{S}_f\,\underline{\textsf{x}}] \) for some context \( C \), and \( |x| > B(|f|, |w|) \). Let us now construct an alternative oracle: let \( \textsf{0}\mathrel {:}\textsf{nat}, \textsf{s}\mathrel {:}\textsf{nat}\Rightarrow \textsf{nat}, \textsf{S}_f' \mathrel {:}\textsf{word}\Rightarrow \textsf{word}\) and \( \textsf{helper}\mathrel {:}\textsf{nat}\Rightarrow \textsf{nat}\Rightarrow \textsf{nat}\), and for \( N := D(|f|,|w|) \), let \( \mathcal {R}_{f,w}' \) be given by:

$$ \begin{array}{rcllcrcl} \textsf{S}_f' \,\underline{\textsf{x}} &{} \rightarrow &{} \underline{\mathsf {f(x)}} &{} \text {if}\ |x| \le B(|f|, |w|) &{} \quad \quad &{} \textsf{helper}\,\textsf{0}\,y&{} \rightarrow &{} y\\ \textsf{S}_f' \,\underline{\textsf{x}} &{} \rightarrow &{} \textsf{helper}\,\ulcorner \textsf{N}\urcorner \,\underline{\mathsf {f(x)}} &{} \text {otherwise} &{} \quad \quad &{} \textsf{helper}\,(\textsf{s}\,x) \,y&{} \rightarrow &{} \textsf{helper}\,x\,y\\ \end{array} $$

Where \( \ulcorner \textsf{N}\urcorner \) is the unary number encoding of \( N \) as introduced in Section 2.1. Notice that by definition, the rules for \( \textsf{S}_f' \) will produce \( \underline{\mathsf {f(x)}} \) in one step if \( |x| \le B(|f|, |w|) \), but they will take \(N+2\) steps otherwise. Also observe that \( \textsf{S}_f \) and \( \textsf{S}_f' \) behave the same; that is, \( \textsf{S}_f \,\underline{\textsf{x}} \) and \( \textsf{S}_f' \,\underline{\textsf{x}} \) have the same normal form on any input \( \underline{\textsf{x}} \). We extend the interpretation function of the original signature with:

$$ \mathcal {J}_{\textsf{S}_f'}^\textsf{c} = \lambda \!\!\!\lambda x.\left\{ \begin{array}{ll} 1 &{} \text {if}\ x\le B(|f|, |n|) \\ N+2 &{} \text {if}\ x> B(|f|, |n|) \\ \end{array} \right. \quad \quad \mathcal {J}_{\textsf{S}_f'}^\textsf{s} = \mathcal {J}_{\textsf{S}_f}^\textsf{s}(y) $$
$$ \begin{array}{rclcrclcrclcrcl} \mathcal {J}_{\textsf{helper}}^\textsf{c}&{} = &{} \lambda \!\!\!\lambda x y. x + 1 &{} \quad &{} \mathcal {J}_{\textsf{helper}}^\textsf{s}&{} = &{} \lambda \!\!\!\lambda x y. y &{} \quad &{} \mathcal {J}_{\textsf{0}}^\textsf{s}&{} = &{} 0 &{} \quad &{} \mathcal {J}_{\textsf{s}}^\textsf{s}&{} = &{} \lambda \!\!\!\lambda x. x + 1 \\ \end{array} $$

We easily see that this orients all rules in \( \mathcal {R}_{f,w} \). Then, by Lemma 2, \( \textsf{cost}(\textsf{F}\,\textsf{S}_f' \,\underline{\textsf{w}}) \le P(\mathcal {J}_{\textsf{S}_f'}^\textsf{c},\mathcal {J}_{\textsf{S}_f'}^\textsf{s}, {\llbracket {} \underline{\textsf{w}} \rrbracket }^\textsf{s}) = P(\mathcal {J}_{\textsf{S}_f}^\textsf{c},\mathcal {J}_{ \textsf{S}_f}^\textsf{s},{\llbracket {} \underline{\textsf{w}} \rrbracket }^\textsf{s}) \le D(|f|,|w|) = N \). Yet, as we have \( \textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}} \rightarrow _{\mathcal {R}_{+f}}^* C[\textsf{S}_f\,\underline{\textsf{x}}] \), we also have \( \textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}} \rightarrow _{\mathcal {R}\cup \mathcal {R}_{f,w}'} C'[\textsf{S}_f' \,\underline{\textsf{x}}] \), where \( C' \) is obtained from \( C \) by replacing all occurrences of \( \textsf{S}_f\) by \( \textsf{S}_f' \). Since \( |x| > B(|f|,|w|) \) by assumption, the reduction \( \textsf{F}\,\textsf{S}_f' \,\underline{\textsf{w}} \rightarrow _{\mathcal {R}\cup \mathcal {R}_{f,w}'}^* C[\textsf{S}_f' \,\underline{\textsf{w}}] \rightarrow _{\mathcal {R}\cup \mathcal {R}_{f,w'}}^* C[\underline{\mathsf {f(x)}}] \) takes strictly more than \( N \) steps, contradicting Theorem 1.    \(\square \)

4.3 Graph Rewriting

Lemma 1 guarantees that if \( \mathcal {R}\) is compatible with a suitable interpretation, then at most polynomially many \( \mathcal {R}_{+f}\)-steps can be performed starting in \( \textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}} \). However, as observed in Section 4.1, this does not yet imply that a type-2 functional computed by an STRS with such an interpretation is in \( \texttt{BFF}\). To simulate a reduction on an OTM, we must find a representation whose size does not increase too much in any given step. The answer is graph rewriting.

Definition 7

A term graph for a signature \( \varSigma \) is a tuple \( (V,\texttt{label},\texttt{succ},{\varLambda }) \) with \( V\) a finite nonempty set of vertices; \( {\varLambda }\in V\) a designated vertex called the root; \( \texttt{label}: V\longrightarrow \varSigma \cup \{ @\} \) a partial function with \( @\) fresh; and \( \texttt{succ}: V\longrightarrow V^* \) a total function such that \( \texttt{succ}(v) = v_1v_2 \) when \( \texttt{label}(v) = @\) and \( \texttt{succ}(v) = \varepsilon \) otherwise. We view this as a directed graph, with an edge from \( v \) to \( v' \) if \( v' \in \texttt{succ}(v) \), and require that this graph is acyclic (i.e., there is no path from any \( v \) to itself). Given term graph \( G\), we will often directly refer to \( V_G, \texttt{label}_G\), etc.

Fig. 1.
figure 1

A term graph, its simplified version, and two graphs with sharing

Term graphs can be denoted visually in an intuitive way. For example, using \( \varSigma \) from Example 1, the graph with \( V= \{v_0,\dots ,v_4\} \), \( \texttt{label}= \{ v_0,v_1 \mapsto @,\ v_2 \mapsto \textsf{add}\} \), \( \texttt{succ}= \{ v_0 \mapsto v_1v_4,\ v_1 \mapsto v_2v_3,\ v_3,v_4,v_5 \mapsto \varepsilon \} \) and \( {\varLambda }= v_0 \) is pictured in Figure 1a. We use \( \bot \) to indicate unlabeled vertices and a circle for \( {\varLambda }\). We will typically omit vertex names, as done in Figure 1b. Note that the definition allows multiple vertices to have the same vertex as successor; these successor vertices with in-degree \( > 1 \) are shared. Two examples are denoted in Figures 1c and 1d.

Each term has a natural representation as a tree. Formally, for a term \( s \) we let \( [s]_{\mathbb {G}} = (\texttt{pos}(s),\texttt{label},\texttt{succ},\sharp ) \) where \( \texttt{label}(p) = @\) if \( s|_p = s_1 s_2 \) and \( \texttt{label}(p) = \textsf{f}\) if \( s|_p = \textsf{f}\); \( \texttt{label}(p) \) is not defined if \( s|_p \) is a variable; and \( \texttt{succ}(p) = (1 \cdot p)(2 \cdot p) \) if \( s|_p = s_1 \,s_2 \) and \( \texttt{succ}(p) = \varepsilon \) otherwise. Essentially, \( [s]_{\mathbb {G}} \) maintains the positioning structure of \( s \) and forgets variable names. For example, Figure 1b denotes both \( [\textsf{add}\,x \,y]_{\mathbb {G}} \) and \( [\textsf{add}\,x \,x]_{\mathbb {G}} \).

Our next step is to reduce term graphs using rules. We limit interest to left-linear rules, which includes all rules in \( \mathcal {R}_{+f}\) (as \( \mathcal {R}\) is orthogonal, and the rules in \( \mathcal {R}_f \) are ground). To define reduction, we will need some helper definitions.

Definition 8

Let \( G= (V,\texttt{label},\texttt{succ},{\varLambda }), v \in V\). The subgraph \( \textsf{reach}(G,v) \) of \( G\) rooted at \( v \) is the term graph \( (V',\texttt{label}',\texttt{succ}',v) \) where \( V' \) contains those \( v' \in V\) such that a path from \( v \) to \( v' \) exists, and \( \texttt{label}', \texttt{succ}' \) are respectively the limitations of \( \texttt{label}\) and \( \texttt{succ}\) to \( V' \).

Definition 9

A homomorphism between two term graphs \( G\) and \( H\) is a function \( \phi : V_G\longrightarrow V_H\) with \( \phi ({\varLambda }_{G}) = {\varLambda }_{H} \), and for \( v\in V_G\) such that \( \texttt{label}_G(v) \) is defined, \( \texttt{label}_H(\phi (v)) = \texttt{label}_G(v) \) and \( \texttt{succ}_H(\phi (v)) = \phi (v_1) \ldots \phi (v_k) \) when \( \texttt{succ}_G(v) = v_1 \ldots v_k \). (If \( \texttt{label}_G(v) \) is undefined, \(\texttt{succ}_H(\phi (v)) \) may be anything.)

Definition 10

A redex in \( G\) is a triple \( (\rho ,v,\phi ) \) consisting of some rule \( \rho = \ell \rightarrow r \in \mathcal {R}_{+f}\), a vertex \( v\) in \( V_G\), and a homomorphism \( \phi : [\ell ]_{\mathbb {G}} \longrightarrow \textsf{reach}(G,v) \).

Definition 11

Let \( G\) be a term graph and \( v_1, v_2 \) vertices in \( G\). The redirection of \( v_1 \) to \( v_2 \) is the term graph \( G[v_1 \mathbin {\gg }v_2] \equiv (V_G, \texttt{label}_G, \texttt{succ}_{G'}, {\varLambda }_G') \) with

$$ {\texttt{succ}_{G'}(v)}_i = {\left\{ \begin{array}{ll} v_2, &{} \text {if } {\texttt{succ}_G(v)}_i = v_1 \\ {\texttt{succ}_G(v)}_i, &{} \text {otherwise} \end{array}\right. } \quad {\varLambda }_G' = {\left\{ \begin{array}{ll} v_2 &{} \text {if } {\varLambda }_G= v_1 \\ {\varLambda }_G&{} \text {otherwise} \\ \end{array}\right. } $$

That is, we replace every reference to \( v_1 \) by a reference to \( v_2 \). With these definitions in hand, we can define contraction of term graphs:

Definition 12

Let \( G\) be a term graph, and \( (\rho ,v,\phi ) \) a redex in \( G\) with \( \rho \in \mathcal {R}_{+f}\), such that no other vertex \( v' \) in \( \textsf{reach}(G,v) \) admits a redex (so \( v \) is an innermost redex position). Denote \( a_x \) for the position of variable \( x \) in \( \ell \), and recall that \( a_x \) is a vertex in \( [\ell ]_{\mathbb {G}} \). By left-linearity, \( a_x \) is unique for \( x \in \texttt{vars}(\ell ) \). The contraction of \( (\rho , v, \phi ) \) in \( G\) is the term graph \( J\) produced after the following steps: \( H\) (building), \( I\) (redirection), and \( J\) (garbage collection).

  • (building) Let \( H= (V_H,\texttt{label}_H, \texttt{succ}_H,{\varLambda }_G) \) where:

    • \( V_H= V_G\uplus \{ \overline{p} \in \texttt{pos}(r) \mid r|_p\ \text {is not a variable} \} \) (\( \uplus \) means disjoint union);

    • for \( v\in V_G\): \( \texttt{label}_H(v) = \texttt{label}_G(v) \) and \( \texttt{succ}_H(v) = \texttt{succ}_G( v) \)

    • for \( p \in V_H\) with \( r|_p \) not a variable:

      • \( \texttt{label}_H(\overline{p}) = \textsf{f}\) if \( r|_p = \textsf{f}\) and \( \texttt{label}_H(\overline{p}) = @\) otherwise

      • \( \texttt{succ}_H(\overline{p}) = \varepsilon \) if \( r|_p = \textsf{f}\); otherwise, \( \texttt{succ}_H(\overline{p}) = \psi (1 \cdot p) \psi (2 \cdot p) \) Here, \( \psi (q) = \overline{q} \) if \( r|_q \) is not a variable; if \( r|_q = x \) then \( \psi (q) = \phi (a_x) \).

  • (redirection) If \( r \) is a variable \( x \) (so \( H= G\)), then let \( I= G[v\mathbin {\gg }\phi (a_x)] \). Otherwise, let \( I= H[v\mathbin {\gg }\overline{\sharp }] \), so with all references to \( v\) redirected to the root vertex for \( r \).

  • (garbage collection) Let \( J:= \textsf{reach}(I,{\varLambda }_I) \) (so remove unreachable vertices).

We then write \( G\rightsquigarrow J\) in one step, and \( G\rightsquigarrow ^n J\) for the n-step reduction.

We illustrate this with two examples. First, we aim to rewrite the graph of Figure 2a with a rule \( \textsf{add}\,\textsf{0}\,y \rightarrow y \) at vertex \( v\). Since the right-hand side is a variable, the building phase does nothing. The result of the redirection phase is given in Figure 2b, and the result of the garbage collection in Figure 2c.

Fig. 2.
figure 2

Reducing a graph with the rule \( \textsf{add}\,\textsf{0}\,y \rightarrow y \)

Second, we consider a reduction by \( \textsf{mult}\,(\textsf{s}\,x) \,y \rightarrow \textsf{add}\,y \,(\textsf{mult}\,x \,y) \). Figure 3a shows the result of the building phase, with the vertices and edges added during this phase in red. Redirection sets the root to the squared node (the root of the right-hand side), and the result after garbage collection is in Figure 3b.

Fig. 3.
figure 3

Reducing a term graph with substantial sharing

Note that, even when a term graph \( G\) is not a tree, we can find a corresponding term: we assign a variable \( var (v) \) to each unlabeled vertex \( v\) in \( G\), and let:

$$ \theta (v) = \left\{ \begin{array}{ll} \theta (v_1) \,\theta (v_2) &{} \text {if}\ \texttt{label}(v) = @\ \text {and}\ \texttt{succ}(v) = v_1v_2 \\ \textsf{f}&{} \text {if}\ \texttt{label}(v) = \textsf{f}\\ var (v) &{} \text {if}\ \texttt{label}(v)\ \text {is undefined} \\ \end{array} \right. $$

Then we may define \( [G]^{-1}_\mathbb {G} = \theta ({\varLambda }_G) \). For a linear term, clearly \( [[s]_{\mathbb {G}}]^{-1}_\mathbb {G} = s \) (modulo variable renaming). We make the following observation:

Lemma 4

Assume given a term graph \( G\) such that there is a path from \( {\varLambda }_G\) to every vertex in \( V_G\), and let \( [G]^{-1}_\mathbb {G} = s \). If \( G\rightsquigarrow H\) then \( [G]^{-1}_\mathbb {G} \rightarrow _{\mathcal {R}_{+f}}^+ [H]^{-1}_\mathbb {G} \). Moreover, if \( s \rightarrow _{\mathcal {R}_{+f}} t \) for some \( t \), then there exists \( H\) such that \( G\rightsquigarrow H\).

Consequently, if \( \rightarrow _{\mathcal {R}_{+f}} \) is terminating, then so is \( \rightsquigarrow \); and if \( [s]_{\mathbb {G}} \rightsquigarrow ^n G\) for some ground term \( s \) then \( s \rightarrow _{\mathcal {R}_{+f}}^* [G]^{-1}_\mathbb {G} \) in at least \( n \) steps. Notice that if \( G\) does not admit any redex, then \( [G]^{-1}_\mathbb {G} \) is in normal form. Moreover, since \( \mathcal {R}_{+f}= \mathcal {R}\cup \mathcal {R}_f \) is orthogonal (as \( \mathcal {R}\) is orthogonal and the \( \mathcal {R}_f \) rules are non-overlapping) and therefore confluent, this is the unique normal form of \( s \). We conclude:

Corollary 1

If \( [\textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}}]_{\mathbb {G}} \rightsquigarrow ^n G\), then \( n \le D(|f|,|w|) \); and if \( G\) is in normal form, then \( [G]^{-1}_\mathbb {G} = \underline{\mathsf {\varPsi (f,w)}} \).

4.4 Bringing Everything Together

We are now ready to complete the soundness proof following the recipe at the start of the section. Towards the third bullet point, we make the following observation.

Lemma 5

There is a constant \( a \) such that, whenever \( G\rightsquigarrow H\) by a rule in \( \mathcal {R}\), then \( |H| \le |G| + a \), where \( |G| \) denotes the total number of nodes in the graph \( G\).

Proof

In a step using a rule \( \ell \rightarrow r \), the number of nodes in the graph can be increased at most by \( |[r]_{\mathbb {G}}| \). As there are only finitely many rules in \( \mathcal {R}\), we can let \( a \) be the number of nodes in the largest graph for a right-hand side \( r \).    \(\square \)

To see that graph rewriting with \( \textsf{S}_f \) can be implemented in an efficient way, we observe that the size of any intermediary graph in the reduction \( [\textsf{G}\,\underline{\textsf{w}}]_{\mathbb {G}} \mathrel {\rightarrow _\mathcal {R}^{+}}[q]_{\mathbb {G}} \) is polynomially bounded by a second-order polynomial over \( |f|, |w| \):

Lemma 6

There is a second-order polynomial \( Q \) such that if \( [\textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}}]_{\mathbb {G}} \rightsquigarrow ^* H\), then \( |H| \le Q(|f|, |w|) \).

Proof

Let \( Q(F,x) := x + D(F,x) * (a + F(B(F, x))) \), where \( D \) is the polynomial from Lemma 1, \( a \) is the constant from Lemma 5, and \( B \) is the polynomial from Section 4.2. This suffices, because there are at most \( D(|f|,|w|) \) steps (Lemma 1, Corollary corollary 1), each of which increases the graph size by at most \( \max (a,|f|(B(|f|,|w|))) \).    \(\square \)

All in all, we are finally ready to prove the soundness side of the main theorem:

Theorem 3

Let \( \mathcal {R}\) be a finite orthogonal STRS admitting a polynomially bounded interpretation. If \( \textsf{F}\) computes a type-2 functional \( \varPsi \), then \( \varPsi \in \texttt{BFF}\).

Proof

Given \( {(\mathbb {F},\mathcal {R})}\), we can construct an OTM \( M\) so that for a given \( f\in W\longrightarrow W\), the machine \( M_f \) executed on \( w \in W\) computes the normal form of \( \textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}} \) under \( \rightarrow _{\mathcal {R}_{+f}} \) using graph rewriting. We omit the exact construction, but observe:

  • that we can represent each graph in polynomial space in the size of the graph;

  • that we can do a rewriting step that does not call the oracle (so using a rule in \( \mathcal {R}\)) following the contraction algorithm we defined in Definition 12, which is clearly feasible to do in polynomial time in the size of the graph;

  • and that each oracle call (implemented in rewriting by a \( \mathcal {R}_f\)-step \( \textsf{S}_f\,\underline{\textsf{x}} \rightarrow \underline{\textsf{y}} \)) is resolved by copying \( \underline{\textsf{x}} \) to the query tape, transitioning to the query state, and from the answer state copying \( \underline{\textsf{y}} \) from the answer tape to the main tape. By Lemma 3 this is doable in polynomial time in \( |f|, |w| \) and the graph size.

By Lemma 6, graph sizes are bounded by a polynomial over \( |f|, |w| \), so using the above reasoning, the same holds for the cost of each reduction step. In summary: the total cost of \( M_f\) running on \( w \) is bounded by a second-order polynomial in terms of \( |f| \) and \( |w| \). As \( M_f\) simulates \( \mathcal {R}_{+f}\) via graph rewriting and \( \mathcal {R}_{+f}\) computes \( \varPsi \), \( M\) also computes \( \varPsi \). By Definition 3, \( \varPsi \) is in \( \texttt{BFF}{} \).    \(\square \)

5 Completeness

Recall from Section 3 that to prove completeness we have to show the following: if a given type-2 functional \( \varPsi \) is in \(\texttt{BFF}\), then there exists an orthogonal STRS that computes \( \varPsi \) and admits a polynomially bounded interpretation. We prove this by providing an encoding of OTMs as STRSs that admit a polynomially bounded interpretation.

The encoding is divided into three steps. In Section 5.1, we will define the function symbols that will allow us to encode any possible machine configuration as terms. In Section 5.2, we will encode transitions as reduction rules that rewrite configuration terms. Lastly, we will design an STRS to simulate a complete execution of an OTM in polynomially many steps. Achieving this polynomial bound is non-trivial and is done in Sections 5.35.4.

Henceforth, we assume given a fixed OTM \( M\), and a second-order polynomial \( P_M\), such that \( M\) operates in time \( P_M\). For simplicity, we assume the machine has only three tapes (one input/output tape, one query tape, one answer tape); that each non-oracle transition only operates on one tape (i.e., reading/writing and moving the tape head); and that we only have tape symbols \( \{ 0,1,\texttt{B}\} \).

5.1 Representing Configurations

Following 3, we have \( \textsf{o},\textsf{i}\mathrel {:}\textsf{bit},\ \mathsf {::}\mathrel {:}\textsf{bit}\Rightarrow \textsf{word}\Rightarrow \textsf{word}\) and \( \mathsf {[]}\mathrel {:}\textsf{word}\). To represent a (partial) tape, we also introduce \( \textsf{b}\mathrel {:}\textsf{bit}\) for the blank symbol. Now for instance a tape with content \( 011\texttt{B}01\texttt{B}\texttt{B}\cdots \) (followed by infinitely many blanks) may be represented as the list \( [\textsf{o};\textsf{i};\textsf{i};\textsf{b};\textsf{o};\textsf{i}] \) of type \( \textsf{word}\). We may also add an arbitrary number of blanks at the end of the representation; e.g., \( [\textsf{o};\textsf{i};\textsf{i};\textsf{b};\textsf{o};\textsf{i};\textsf{b};\textsf{b}] \).

We can think of a tape configuration — the combination of a tape and the position of the tape head — as a finite word \( w_1 \ldots w_{p - 1} {\#}w_p w_{p + 1} \ldots w_k \) (followed by infinitely many blanks). Here, the tape’s head is reading the symbol \( w_p \). We can split this tape into two components: the left word \( w_1 \ldots w_{p - 1} \), and the right word \( w_p \ldots w_k \). To represent a tape configuration, we introduce three symbols:

$$\begin{aligned} \textsf{L}&\mathrel {:}\textsf{word}\Rightarrow \textsf{left}&\textsf{R}&\mathrel {:}\textsf{word}\Rightarrow \textsf{right}&\textsf{split}&\mathrel {:}\textsf{left}\Rightarrow \textsf{right}\Rightarrow \textsf{tape}\end{aligned}$$

Here, \( \textsf{L}, \textsf{R}\) hold the content of the left and right split of the tape, respectively. While we technically do not need these two constructors (we could have \( \textsf{split}\mathrel {:}\textsf{word}\Rightarrow \textsf{word}\Rightarrow \textsf{tape}\)), they serve to make configurations more human-readable. For convenience in rewriting transitions, later on, we will encode the left side of the split in reverse order. Specifically, we encode \( w_1 \ldots w_{p - 1} {\#}w_p w_{p + 1} \ldots w_k \) as

$$ \textsf{split}\,(\textsf{L}\,[w_{p-1};\dots ;w_2 ;w_1]) \,(\textsf{R}\,[w_p;\dots ;w_{k-1};w_k]) $$

The symbol currently being read is the first element of the list below \( \textsf{R}\); in case of \( \textsf{R}\,\mathsf {[]}\), this symbol is \( \texttt{B}\). For a concrete example, a tape configuration \( 1\texttt{B}0{\#}10 \) is represented by: \( \textsf{split}\,(\textsf{L}\,[\textsf{o};\textsf{b};\textsf{i}]) \,(\textsf{R}\,[\textsf{i};\textsf{o}]) \). Since we have assumed an OTM with three tapes, a configuration of the machine at any moment is a tuple \( (q, t_1,t_2,t_3) \), with \( q\) a state and \( t_1,t_2,t_3 \) tape configurations. To represent machine configurations, we introduce, for each state \( q\), a symbol \( \textsf{q}\mathrel {:}\textsf{tape}\Rightarrow \textsf{tape}\Rightarrow \textsf{tape}\Rightarrow \textsf{config}\). Thus, a configuration \( (q,t_1,t_2,t_3) \) is represented by a term \( \textsf{q}\,T_1 \,T_2 \,T_3 \).

Example 5

The initial configuration for a machine \( M_f \) on input \( w \) is a tuple \( (q_0,{\#}w,{\#}\texttt{B},{\#}\texttt{B}) \). This is represented by the term

$$ \textsf{initial}(w) := \mathtt {q_0}\,(\textsf{split}\,(\textsf{L}\,\mathsf {[]}) \,(\textsf{R}\,\underline{\textsf{w}})) \,(\textsf{split}\,(\textsf{L}\,\mathsf {[]}) \,(\textsf{R}\,\mathsf {[]})) \,(\textsf{split}\,(\textsf{L}\,\mathsf {[]}) \,(\textsf{R}\,\mathsf {[]})) $$

To interpret the symbols from this section, we let \( (\mathcal {S}_{\iota },\sqsupseteq _{\iota }) := (\mathbb {N},\ge ) \) for all \( \iota \), let \( \mathcal {J}_{\textsf{f}}^\textsf{c} = \lambda \!\!\!\lambda x_1 \dots x_m.0 \) whenever \( \textsf{f}\) takes \( m \) arguments, and for the sizes:

$$ \begin{array}{rclcrclcrclcrclcr} \mathcal {J}_{\textsf{o}}^\textsf{s} &{} = &{} 0 &{} \quad &{} \mathcal {J}_{\textsf{b}}^\textsf{s} &{} = &{} 0 &{} \quad &{} \mathcal {J}_{\textsf{L}}^\textsf{s} &{} = &{} \lambda \!\!\!\lambda x.x &{} \quad &{} \mathcal {J}_{\mathsf {::}}^\textsf{s} &{} = &{} \lambda \!\!\!\lambda xy.x+y+1 &{} \quad &{} \mathcal {J}_{\texttt{q}}^\textsf{s} = \lambda \!\!\!\lambda xyz.x+y \\ \mathcal {J}_{\textsf{i}}^\textsf{s} &{} = &{} 0 &{} \quad &{} \mathcal {J}_{\mathsf {[]}}^\textsf{s} &{} = &{} 0 &{} \quad &{} \mathcal {J}_{\textsf{R}}^\textsf{s} &{} = &{} \lambda \!\!\!\lambda x.x &{} \quad &{} \mathcal {J}_{\textsf{split}}^\textsf{s} &{} = &{} \lambda \!\!\!\lambda x.xy.x+y &{} \quad &{}\text {(for all states q )} \\ \end{array} $$

Hence, \( {\llbracket {} \underline{\textsf{w}} \rrbracket }^\textsf{s} = |w| \), which satisfies the requirements of Theorem 2; the size of a tape configuration \( w_1\ldots w_{p-1}{\#}w_p\ldots w_k \) is \( k \), and the size of a configuration is the size of its first and second tapes combined. We do not include the third tape, as it does not directly affect either the result yielded by the final configuration (this is read from the first tape), nor the size of a word the oracle \( f \) is applied on.

5.2 Executing The Machine

A single step in an OTM can either be an oracle call (a transition from the \( \texttt{query} \) state to the \( \texttt{answer} \) state), or a traditional step: we assume that an OTM \( M\) has a fixed set \( \mathcal {T}\) of transitions \( q {\Rightarrow }^{r/i,\ d} l \) where \( q \) is the input state, \( l \) the output state, \( t \in \{1,2,3\} \) the tape considered (recall that we have assumed that a non-oracle transition only operates on one tape), \( r,i \in \{ 0, 1, \texttt{B}\} \) respectively the symbol being read and the symbol being written, and \( d \in \{L,R\} \) the direction for the read head of tape \( t \) to move. We will model the computation of \( M\) as rules that simulate the small step semantics for the machine.

To encode a single transition, let \( \textsf{step}\mathrel {:}(\textsf{word}\Rightarrow \textsf{word}) \Rightarrow \textsf{config}\Rightarrow \textsf{config}\). For any transition of the form \( q {\Rightarrow }^{r/i,\ L} l \) (so a transition operating on tape 1, and moving left), we introduce a rule (where we write \( \underline{\textsf{0}} = \textsf{o},\ \underline{\textsf{1}} = \textsf{i},\ \underline{\mathsf {\texttt{B}}} = \textsf{b}\)):

$$ \textsf{step}\,F\,( \texttt{q}\,( \textsf{split}\,(\textsf{L}\,(x\mathsf {::}y)) \,(\textsf{R}\,(\underline{\textsf{r}} \mathsf {::}z)) ) \,u \,v ) \rightarrow \texttt{l}\,( \textsf{split}\,(\textsf{L}\,y) \,(\textsf{R}\,(x\mathsf {::}\underline{\textsf{i}} \mathsf {::}z)) ) \,u \,v $$

Moreover, for transitions \( q {\Rightarrow }^{\texttt{B}/w,\ L} l \) (so where \( \texttt{B}\) is read), we add a rule:

$$ \textsf{step}\,F\,( \texttt{q}\,( \textsf{split}\,(\textsf{L}\,(x\mathsf {::}y)) \,(\textsf{R}\,\mathsf {[]}) ) \,u \,v ) \rightarrow \texttt{l}\,( \textsf{split}\,(\textsf{L}\,y) \,(\textsf{R}\,(x\mathsf {::}\underline{\textsf{i}}\mathsf {::}\mathsf {[]})) ) \,u \,v $$

These rules respectively handle the steps where a tape configuration is changed from \( u_1\ldots u_{p-1}u_p {\#}r u_{p+2} \ldots u_k \) to \( u_1\ldots u_{p-1}{\#}u_p i u_{p+2} \ldots u_k \), and where a tape configuration is changed from \( u_1\ldots u_k {\#}\) to \( u_1\ldots {\#}u_k i \).

Transitions where \( d = R\), or on the other two tapes, are encoded similarly.

Next, we encode oracle calls. Recall that, to query the machine for the value of \( f \) at \( u \), we write \( u \) on the second tape, move its head to the leftmost position, and enter the query state. Then, the content of this tape is erased and the image of \( f \) over \( u \) is written in the third tape. Visually, this step is represented as:

$$ (\texttt{query}, \langle \text {tape}_1\rangle , v_1 \ldots v_p{\#}\underline{\textsf{u}} \texttt{B}\ldots , \langle \text {tape}_3\rangle ) \rightsquigarrow (\texttt{answer}, \langle \text {tape}_1\rangle , {\#}\texttt{B}, {\#}\underline{\mathsf {f(u)}} ) $$

This is implemented by the following rules:

figure b
$$ \begin{array}{rclcrclcrcl} \textsf{clean}\,(\textsf{o}\mathsf {::}x) &{} \rightarrow &{} \textsf{o}\mathsf {::}(\textsf{clean}\,x) &{} \quad &{} \textsf{clean}\,(\textsf{b}\mathsf {::}x) &{} \rightarrow &{} \mathsf {[]}\\ \textsf{clean}\,(\textsf{i}\mathsf {::}x) &{} \rightarrow &{} \textsf{i}\mathsf {::}(\textsf{clean}\,x) &{} \quad &{} \textsf{clean}\,\mathsf {[]}&{} \rightarrow &{} \mathsf {[]}\\ \end{array} $$

Here, \( \textsf{clean}\mathrel {:}\textsf{word}\Rightarrow \textsf{word}\) turns a word that may have blanks in it into a bitstring, by reading until the next blank; for instance replacing \( [\textsf{o};\textsf{i};\textsf{b};\textsf{i}] \) by \( [\textsf{o};\textsf{i}] \).

The various \( \textsf{step}\) rules, as well as the \( \textsf{clean}\) rules, are non-overlapping because we consider deterministic OTMs. They are also left-linear, and are oriented using:

$$ \begin{array}{rclcrcl} \mathcal {J}_{\textsf{clean}}^\textsf{s} &{} = &{} \lambda \!\!\!\lambda x.x &{} \quad &{} \mathcal {J}_{\textsf{clean}}^\textsf{c} &{} = &{} \lambda \!\!\!\lambda x.x+1 \\ \mathcal {J}_{\textsf{step}}^\textsf{s} &{} = &{} \lambda \!\!\!\lambda Fx.x + 1 &{} \quad &{} \mathcal {J}_{\textsf{step}}^\textsf{c} &{} = &{} \lambda \!\!\!\lambda F^c F^s x. F^c(x) + x + 2 \\ \end{array} $$

(Note that \( \mathcal {J}_{\textsf{step}}^\textsf{s} \) is so simple because the size of a configuration does not include the size of the answer tape.) From the rules, the following result is obvious:

Lemma 7

Let \( M_f \) be an OTM and \( C, C' \) be machine configurations of \( M_f \) such that \( C \rightsquigarrow C' \). Then \( \textsf{step}\,\textsf{S}_f \,{[C]} \mathrel {\rightarrow _\mathcal {R}^{+}}{[C']} \), where \( {[C]} \) is the term encoding of \(C\).

5.3 A Bound on the Number of Steps

To generalize from performing a single step of the machine to tracing a full computation on the machine level, the natural idea would be to define rules such as:

$$ \begin{array}{rcll} \textsf{execute}\,F \,(\texttt{q}\,x \,y \,z) &{} \rightarrow &{} \textsf{execute}\,F \,(\textsf{step}(\texttt{q}\,x \,y \,z)) &{} \text {for}\ \texttt{q}\ne \texttt{end}\\ \textsf{execute}\,F \,(\texttt{end}\,(\textsf{split}\,(\textsf{L}\,x) \,(\textsf{R}\,w)) \,y \,z) &{} \rightarrow &{} \textsf{clean}\,w \\ \end{array} $$

Then, reducing \( \textsf{execute}\,\textsf{S}_f\,\textsf{initial}(w) \) to normal form simulates a full OTM execution of \( M_f \) on input \( w \). Unfortunately, this rule does not admit an interpretation, as it may be non-terminating. A solution could be to give \( \textsf{execute}\) an additional argument \( \ulcorner \textsf{N}\urcorner \) suggesting an execution in at most \( N \) steps; this argument would ensure termination, and could be used to find an interpretation.

The challenge, however, is to compute a bound on the number of steps in the OTM: the obvious thought is to compute \( P_M(|f|,|w|) \), but this cannot in general be done in polynomial time because the STRS does not have access to \( |f| \): since \( |f|(i) = \max \{ x \in \mathbb {N}\mid |x| \le i \} \), there are exponentially many choices for \( x \).

To solve this, and following [22, Proposition 2.3], we observe that it suffices to know a bound for \( f(x) \) for only those \( x \) on which the oracle is actually questioned. That is, for \( A \subseteq W\), let \( |f|_{A} = \lambda \!\!\!\lambda n. \max \{ |f(x)| \mid x \in A \wedge |x| \le n \} \). Then:

Lemma 8

Suppose an OTM \( M_f \) runs in time bounded by \( P_M(|f|,|w|) \) on input \( w \). If \( M_f \) transitions in \( N \) steps from its initial state to some configuration \( C \), calling the oracle only on words in \( A \subseteq W\), then \( N \le P_M(|f|_{A},|w|) \).

Proof (Sketch)

[Sketch] We construct \( f' \) with \( f'(x) = 0 \) if \( x \notin A \) and \( f'(x) = f(x) \) if \( x \in A \). Then \( |f'| = |f|_{A} \), and \( M_{f'} \) runs the same on input \( w \) as \( M_f \) does.    \(\square \)

Now, for \( A \) encoded as a term \( \textsf{A} \) (using symbols \(\mathsf {\emptyset }\mathrel {:}\textsf{set},\ \textsf{setcons}\mathrel {:}\textsf{word}\Rightarrow \textsf{set}\Rightarrow \textsf{set}\)), we can compute \( |f|_{A} \) using the rules below, where we use unary integers as in Example 1 (\( \textsf{0}\mathrel {:}\textsf{nat}, \textsf{s}\mathrel {:}\textsf{nat}\Rightarrow \textsf{nat}\)), and defined symbols \( \textsf{len}\mathrel {:}\textsf{word}\Rightarrow \textsf{nat},\ \textsf{max}\mathrel {:}\textsf{nat}\Rightarrow \textsf{nat}\Rightarrow \textsf{nat},\ \textsf{limit}\mathrel {:}\textsf{word}\Rightarrow \textsf{nat}\Rightarrow \textsf{word},\ \textsf{retif}\mathrel {:}\textsf{word}\Rightarrow \textsf{nat}\Rightarrow \textsf{word}\Rightarrow \textsf{word},\ \textsf{tryapply}\mathrel {:}(\textsf{word}\Rightarrow \textsf{word}) \Rightarrow \textsf{word}\Rightarrow \textsf{nat}\Rightarrow \textsf{nat},\ \textsf{tryall}\mathrel {:}(\textsf{word}\Rightarrow \textsf{word}) \Rightarrow \textsf{set}\Rightarrow \textsf{nat}\Rightarrow \textsf{nat}\). By design, \( \textsf{retif}\,\underline{\textsf{x}} \,\ulcorner \textsf{n}\urcorner \,\underline{\textsf{y}} \) reduces to \( \underline{\textsf{y}} \) if \( |x| \le n \) and to \( \mathsf {[]}\) otherwise; \( \textsf{tryapply}\,\textsf{S}_f\,\underline{\textsf{x}} \,\ulcorner \textsf{n}\urcorner \) reduces to the unary encoding of \( |F|_{\{x\}}(n) \) and \( \textsf{tryall}\,\textsf {a} \,\underline{\textsf{x}} \,\ulcorner \textsf{n}\urcorner \) yields \( |F|_{A}(n) \).

$$ \begin{array}{rclcrclrcl} \textsf{len}\,\mathsf {[]}&{} \rightarrow &{} \textsf{0}&{} \quad &{} \textsf{len}\,(x \mathsf {::}y) &{} \rightarrow &{} \textsf{s}\,(\textsf{len}\,y) \\ \textsf{max}\,\textsf{0}\,m &{} \rightarrow &{} m &{} \quad &{} \textsf{max}\,(\textsf{s}\,n) \,\textsf{0}&{} \rightarrow &{} \textsf{s}\,n &{} \textsf{max}\,(\textsf{s}\,n) \,(\textsf{s}\,m) &{} \rightarrow &{} \textsf{s}\,(\textsf{max}\,n \,m) \\ \textsf{limit}\,\mathsf {[]}\,n &{} \rightarrow &{} \mathsf {[]}&{} \quad &{} \textsf{limit}\,(x \mathsf {::}y) \,\textsf{0}&{} \rightarrow &{} \mathsf {[]}&{} \textsf{limit}\,(x\mathsf {::}y) \,(\textsf{s}\,n) &{} \rightarrow &{} x \mathsf {::}(\textsf{limit}\,y \,n) \\ \textsf{retif}\,\mathsf {[]}\,n \,z &{} \rightarrow &{} z &{} \quad &{} \textsf{retif}\,(x \mathsf {::}y) \,\textsf{0}\,z &{} \rightarrow &{} \mathsf {[]}&{} \textsf{retif}\,(x \mathsf {::}y) \,(\textsf{s}\,n) \,z &{} \rightarrow &{} \textsf{retif}\,y \,n \,z \end{array} $$
$$ \begin{array}{rclcrcl} &{} &{} &{} &{} \textsf{tryapply}\,F \,a \,n &{} \rightarrow &{} \textsf{len}\,(\textsf{retif}\,a \,n \,(F \,(\textsf{limit}\,a \,n))) \\ \textsf{tryall}\,F \,\mathsf {\emptyset }\,n &{} \rightarrow &{} \textsf{0}&{} \quad &{} \textsf{tryall}\,F \,(\textsf{setcons}\,a \,tl) \,n &{} \rightarrow &{} \textsf{max}\,(\textsf{tryapply}\,F \,a \,n) \,(\textsf{tryall}\,F \,tl \,n) \\ \end{array} $$

An interpretation is provided in [5]. Importantly, the \( \textsf{limit}\) function ensures that, in \( \textsf{tryall}\,F \,n \) we never apply \( F \) to a word \( w \) with \( |w| > n \). Therefore we can let \( {\llbracket {} \textsf{A} \rrbracket }^\textsf{s} = |A| \), the number of words in \( A \), and have \( \mathcal {J}_{\textsf{tryall}}^\textsf{s} = \lambda \!\!\!\lambda F a n.F(n) \) and \( \mathcal {J}_{\textsf{tryall}}^\textsf{c} = \lambda \!\!\!\lambda F^c F^s a n.1+a + F^c(n) + 2 * F^s(n) + 2 * n + 6 \).

Now, for a given second-order polynomial \( P \), fixed \( f,n \), and a term \( \textsf {A} \) encoding a set \( A \subseteq W\), we can construct a term \( \varTheta ^P_{\textsf{S}_f;\ulcorner \textsf{n}\urcorner ;\textsf {A}} \) that computes \( P(|f|_{A},n) \) using \( \textsf{tryall}\) and the functions \( \textsf{add},\textsf{mult}\) from Example 1. By induction on \( P \), we have \( {\llbracket {} \varTheta ^P_{\textsf{S}_f;\ulcorner \textsf{n}\urcorner ;\textsf {A}} \rrbracket }^\textsf{s} = P(|f|,n) \), while its cost is bounded by a polynomial over \( |f|,n,|A| \).

5.4 Finalising Execution

Now, we can define execution in a way that can be bounded by a polynomial interpretation. We let \( \textsf{execute}\mathrel {:}(\textsf{word}\Rightarrow \textsf{word}) \Rightarrow \textsf{nat}\Rightarrow \textsf{nnat}\Rightarrow \textsf{nat}\Rightarrow \textsf{set}\Rightarrow \textsf{config}\Rightarrow \textsf{word}\) and will define rules to reduce expressions \( \textsf{execute}\,F \,n \,m \,z \,a \,c \) where

  • \( F \) is the function to be used in oracle calls.

  • \( n - 1 \) is a bound on the number of steps that can be done before the next oracle call (or until the machine completes execution).

  • \( m \) is essentially a natural number that represents the number of steps that have been done so far. We use a new sort \( \textsf{nnat}\) with function symbols \( \textsf{o}\mathrel {:}\textsf{nnat}\) and \( \textsf{n}\mathrel {:}\textsf{nnat}\Rightarrow \textsf{nnat}\) because we will let \( \mathcal {S}_{\textsf{nnat}} = (\mathbb {N},\le ) \), so ordered in the other direction. This will be essential to find an interpretation for \( \textsf{execute}\).

  • \( z \) is a unary representation of \( |w| \), where \( w \) is the input to the OTM.

  • \( c \) is the current configuration.

Using helper symbols \( \textsf{F}' \mathrel {:}(\textsf{word}\Rightarrow \textsf{word}) \Rightarrow \textsf{nat}\Rightarrow \textsf{config}\Rightarrow \textsf{word}\), \( \textsf{execute}' \mathrel {:}(\textsf{word}\Rightarrow \textsf{word}) \Rightarrow \textsf{nat}\Rightarrow \textsf{nnat}\Rightarrow \textsf{nat}\Rightarrow \textsf{set}\Rightarrow \textsf{config}\Rightarrow \textsf{word}\), \( \textsf{extract}\mathrel {:}\textsf{tape}\Rightarrow \textsf{word}\) and \( \textsf{minus}\mathrel {:}\textsf{nat}\Rightarrow \textsf{nnat}\Rightarrow \textsf{nat}\), we introduce the rules:

$$ \begin{array}{l} \textsf{F}\,F \,w \rightarrow \textsf{F}' \,F \,(\textsf{len}\,w) \,(\mathtt {q_0}\,(\textsf{split}(\textsf{L}\,\mathsf {[]}) \,(\textsf{R}\,w)) \,(\textsf{split}(\textsf{L}\,\mathsf {[]}) \,(\textsf{R}\,\mathsf {[]})) \,(\textsf{split}(\textsf{L}\,\mathsf {[]}) \,(\textsf{R}\,\mathsf {[]}))) \\ \textsf{F}' \,F \,z \,c \rightarrow \textsf{execute}\,F \,\varTheta ^{P_M+1}_{F;z;\mathsf {\emptyset }} \,\textsf{o}\,z \,\mathsf {\emptyset }\,c \\ \end{array} $$
$$ \begin{array}{l} \textsf{execute}\,F \,(\textsf{s}\,n) \,m \,z \,a \,(\texttt{q}\,t_1 \,t_2 \,t_3) \rightarrow \\ \qquad \qquad \textsf{execute}\,F \,n \,(\textsf{n}\,m) \,z \,(\textsf{step}\,F \,(\texttt{q}\,t_1 \,t_2 \,t_3)) \ \text {for}\ \texttt{q}\notin \{ \texttt{query}, \texttt{end}\} \\ \textsf{execute}\,F \,(\textsf{s}\,n) \,m \,z \,a \,(\texttt{query}\,t_1 \,t_2 \,t_3) \rightarrow \\ \qquad \qquad \textsf{execute}' \,F \,n \,(\textsf{n}\,m) \,z \,(\textsf{setcons}\,(\textsf{extract}\,t_2) \,a) \,(\texttt{query}\,t_1 \,t_2 \,t_3) \\ \textsf{execute}' \,F \,n \,m \,z \,a \,c \rightarrow \textsf{execute}\,F \,(\textsf{minus}\,\varTheta ^{P_M+1}_{F;z;a} \,m) \,m \,z \,a \,(\textsf{step}\,F \,c) \\ \textsf{execute}\,F \,n \,m \,z \,a \,(\texttt{end}\,t_1 \,t_2 \,t_3) \rightarrow \textsf{extract}\,t_1 \\ \textsf{extract}\,(\textsf{split}\,(\textsf{L}\,x) \,(\textsf{R}\,y)) \rightarrow \textsf{clean}\,y \\ \textsf{minus}\,x \,\textsf{o}\rightarrow x \quad \quad \textsf{minus}\,\textsf{0}\,(\textsf{n}\,y) \rightarrow \textsf{o}\quad \quad \textsf{minus}\,(\textsf{s}\,x) \,(\textsf{n}\,y) \rightarrow \textsf{minus}\,x \,y \\ \end{array} $$

That is, an execution on \( \textsf{F}\,\textsf{S}_f\,\underline{\textsf{w}} \) starts by computing the length of \( w \) and \( P_M(|f|_{\emptyset },|w|) \), and uses these as arguments to \( \textsf{execute}\). Each normal transition lowers the number \( n \) of steps we are allowed to do and increases the number \( n \) of steps we have done. Each oracle transition updates \( A \), and either lowers \( n \) by one, or updates it to the new value \( P_M(|f|_{A},|w|) - m \), since we have already done \( m \) steps. Once we read the final state, the answer is read off the first tape.

For the interpretation, note that the unusual size set of \( \textsf{nnat}\) allows us to choose \( \mathcal {J}_{\textsf{minus}}^\textsf{s} = \lambda \!\!\!\lambda xy.\max (x-y,0) \) without losing monotonicity. Hence, in every step \( \textsf{execute}\,F \,n \,m \,z \,a \,c \), the value \( \max (P_M({\llbracket {} F \rrbracket }^\textsf{s},{\llbracket {} z \rrbracket }^\textsf{s}) + 1 - {\llbracket {} m \rrbracket }^\textsf{s}, {\llbracket {} n \rrbracket }^\textsf{s}) \) decreases by at least one. Since \( {\llbracket {} \varTheta ^{P_M+1}{F;z;a} \rrbracket }^\textsf{s} = P_M({\llbracket {} F \rrbracket }^\textsf{s}, {\llbracket {} z \rrbracket }^\textsf{s}) \) regardless of \( a \), we can use this component as part of the interpretation. The full interpretation functions for \( \textsf{execute}\) and \( \textsf{F}\) are long and complex, so we will not supply them here. They can be found in [5]. We will only conclude the other side of Theorem 2:

Theorem 4

If \( \varPsi \in \texttt{BFF}{} \), then there exists a finite orthogonal STRS \( \mathcal {R}\) such that \( \textsf{F}\) computes \( \varPsi \) in \( \mathcal {R}\) and \( \mathcal {R}\) admits a polynomially bounded interpretation.

6 Conclusions and Future Work

In this paper, we have shown that \( \texttt{BFF}\) can be characterized through second-order term rewriting systems admitting polynomially bounded cost–size interpretations. This is arguably the first characterization of the basic feasible functionals purely in terms of rewriting theoretic concepts.

For the purpose of presentation, we have imposed some mild restrictions that we believe are not essential in practice. In future extensions, we can eliminate these restrictions, such as allowing lambda-abstraction, non-base type rules, and higher-order functions (assuming that \( \textsf{F}\) is still second-order). We can also allow arbitrary inductive data structures as input.

Another direction we definitely wish to explore is the characterization of polynomial time complexity for functionals of order strictly higher than two. It is well known that the underlying theory in this case becomes less robust than in type-2 complexity. As such, it is not clear which of the existing proposals for complexity classes of higher-order polytime complexity we can hope to capture within our framework.