Keywords

1 Introduction

1.1 Category Variables T and Variable Vectors \(\lambda {\overline{x}}\) or \(M{\overline{x}}\)

Type-raising rules in combinatory categorial grammars (CCGs) typically introduce category variables, often represented by a bold letter \({\varvec{T}}\) together with an index (\(i\in \mathbb {N}\)) to distinguish between category variables (Steedman 2000).

figure a

The theoretical status of the type-raising rules can be understood in at least two different ways. The first perspective is that the two rules in (1) are not, themselves, rules; rather, they are rule schema, with \({\varvec{T}}\) being meta-level variables. Instantiating \({\varvec{T}}\) with an actual syntactic category then defines a countably infinite set of rules. Let us call this perspective, in which \({\varvec{T}}\) is not an object-level expression, category variable as meta-variable. The second, category variable as type variable, understands the whole syntactic calculus of CCG as a weak polymorphic type theory (i.e., polymorphism without quantification over \({\varvec{T}}\)), where category variables \({\varvec{T}}\) are object-level expressions.

Under either view, \({\varvec{T}}\) may be instantiated by, for example, a functional category with an arbitrary number of arguments, but their semantic representation in (1), \(\lambda p.pa\), is invariant. This is because only p’s first argument, which is to be filled with a, matters; the number of other argument slots is irrelevant.

However, there are two situations where the invariance of semantic representation for \({\varvec{T}}\) cannot be maintained. Those cases require the use of lambda calculus with terms whose argument slots are both “expandable” and “shrinkable.” Variable vectors \(\overline{x}\) that appear in terms of the form \(\lambda \overline{x}.M\) or \(M\overline{x}\) are typical tools for this purpose (Steedman 2000). The first situation is coordination between type-raised NPs. A typical lexical entry for quantifier NPs in English, such as every N and some N (with the accusative case), has the syntactic category \({\varvec{T}}\backslash ({\varvec{T}}/ NP )\), and these entries participate in the coordinated structure in the following way.

figure b

The resulting semantic representation, \(\lambda \overline{x}.(f\overline{x}\wedge g\overline{x})\), shows the two usages of, and the necessity of, a variable vector \(\overline{x}\); when used with \(\lambda \) in the form of \(\lambda \overline{x}.M\), it is a function that takes an arbitrary number of arguments, enough that the corresponding syntactic category becomes S. When used with a function f (or g), in the form \(f\overline{x}\) (or \(g\overline{x}\)), it is a result of applying a sequence of arguments taken by the corresponding binder \(\lambda \overline{x}\), and the sequence preserves the order of arguments.

The reason why variable vectors are needed in (2), unlike in (1), is that \(\wedge \) is a truth function that conjoins only propositions. In (2), \(f\overline{x}\) and \(g\overline{x}\) are safely conjoined by the truth function \(\wedge \) because the corresponding category for \(f\overline{x}\) and \(g\overline{x}\) is S, and this ensures that their semantic type is proposition. However, the number of arguments needed for f and g to become propositions depends on the syntactic category that instantiates \({\varvec{T}}\). Therefore, we need a variable vector \(\overline{x}\) for semantic representations in cases where we want to not specify the number of arguments. For example, the following two CCG derivations instantiate (1), which uniformly describes the coordination calculus therein.

figure c

The example (3) shows a coordinated structure between two direct objects in English, for which both \({\varvec{T}}\equiv S\backslash NP \) and \(\overline{x}\equiv (p,x)\); namely, \(\lambda \overline{x}.\equiv \lambda p.\lambda x.\) and \(f\overline{x}\equiv ((fp)x)\).

figure d

The example (4) shows a coordinated structure between two indirect objects in English, for which both \({\varvec{T}}\equiv S\backslash NP / NP \) and \(\overline{x}\equiv (p,y,x)\); namely, \(\lambda \overline{x}.\equiv \lambda p.\lambda y.\lambda x.\), \(f\overline{x}\equiv (((fp)y)x)\).

The second situation in which variable vectors are needed is semantic representation of quantifiers.

figure e

Since truth functions \(\rightarrow \) and \(\wedge \) conjoin only propositions, \((px)\overline{x}\) must be a propositional term. However, the number of arguments needed to make p into a proposition depends on the instantiation of the syntactic category \({\varvec{T}}/ NP \), and thus must be underspecified, as seen in (5).

1.2 Problem of Variable Vectors

A more precise definition of \(\overline{x}\) (\(\lambda {\overline{x}}\) and \(M\overline{x}\)) could be given as follows:Footnote 1

Definition 1

(Variable vectors).

$$\begin{aligned} \lambda \overline{x}.M[\dots f\overline{x} \dots ]&\mathop {\equiv }\limits ^{ def }&\left\{ \begin{array}{l} M[\dots f \dots ] \\ \lambda x.\lambda \overline{x}.M[\dots (fx)\overline{x}\dots ] \quad where\ x\not \in fv \left( M[\dots f\overline{x}\dots ]\right) \end{array} \right. \end{aligned}$$

The problem with this pseudo-definition is that it is “defined” in non-deterministic style but the choice is determined by the form of the corresponding syntactic category. For example, in (2), the value of \(|\overline{x}|\) (namely, how many arguments are needed for f and g) depends on the instantiation of \({\varvec{T}}\). In (5), \(|\overline{x}|\) (how many arguments are needed for px) again depends on the instantiation of \({\varvec{T}}\). The strategy adopted here is that one should take the second choice in Definition 1 n times when \(|\overline{x}|=n\), but this determination requires a reference to the corresponding syntactic category.

The fact that the definition of variable vectors depends on the corresponding syntactic category is problematic when trying to situate the definition within the formal theory of CCG. Moreover, implementation is complicated by the definition because variable vectors are not a notion that can be defined entirely within the pure lambda calculus.

Another problem that arises with variable vectors is that they are not subterms from the perspective of a lambda calculus without them. In (4), for example, \(\overline{x}\equiv (p,y,x)\), namely, \(\lambda \overline{x}.\equiv \lambda p.\lambda y.\lambda x.\) and \(f\overline{x}\equiv (((fp)y)x)\); however, neither \(\lambda p.\lambda y.\lambda x.\) nor pyx is a subterm from the viewpoint of the syntax of lambda calculus.

Thus, translation from lambda calculus extended to include variable vectors to lambda calculus without them is no longer possible by simple substitution of variable vectors with a lambda term. Instead, translation requires transformation of a syntactic tree of lambda calculus. In other words, variable vectors underspecify the syntactic structure that surrounds them.

1.3 Previous Work on Category Variables

Presently available implementations of CCG parsers, such as the C&C parser (Clark and Curran 2004), EasyCCG (Lewis and Steedman 2014), and Jigg (Noji and Miyao 2016), have avoided implementation of category variables.

As is well-known, a näive top-down CCG parser is not guaranteed to terminate, because of the existence of the following (infinite) path:

figure f

Likewise, a näive bottom-up CCG parser with type-raising rules would not terminate for the following infinite path.Footnote 2

figure g

Linguistically, adopting type-raising rules allows the extraction of wh-phrases from complex NP islands (Ozaki and Bekki 2012). When regarding CCG as a substructural combinatory logic, the type-raising rules correspond to the \(\mathcal {C}^*\)-combinator and thus strengthen the deduction theorem (i.e., extractability).Footnote 3

Thus, there are both computational and linguistic motivations for not adopting the type-raising rules, and category variables in CCG parsers.

However, this is not sufficient reason to annihilate category variables altogether, since a categorial grammar may employ category variables without adopting the type-raising rules. This is a reasonable choice, particularly because category variables are useful for packing ambiguous but syntactically similar lexical items. Moreover, the computational effect of lexical packing is evident in some languages, such as Japanese, in which all arguments appear on the left-hand side of predicates (and thus, any quantificational NP can be given a single lexical item by using variable vectors).

1.4 Lightblue: A Robust CCG Parser with DTS

Lightblue is a wide-coverage CCG parser for Japanese, implemented in the Haskell programming language, which outputs semantic representations in terms of dependent type semantics (DTS; Bekki (2014), Bekki and Mineshima (2016)).

DTS is a proof-theoretic discourse semantics of natural language based on dependent type theory ((Martin-Löf (1984), Nordström et al. (1990)), which extends TTG (Ranta 1994) with underspecified terms (notation \(@_i^A\)), through which anaphora resolution and presupposition binding are calculated via type checking and proof search in dependent type theory.

There are few Japanese parsers, other than lightblue, that yield (logical) semantic representations. Exceptions are Haruniwa (Tsaiwei et al. 2014) and ccg2lambda (Martínez Gómez et al. 2016). Both separate syntactic and semantic parsing; the former yields syntactic trees as output and the latter transforms them into semantic representations. However, since their lexicons are automatically obtained from large-scale corpora, they do not allow a semantics developer to add, delete, or modify a single lexical item, which is a standard way to improve lexicalized grammar, during the process of grammar development.

Parser development in lightblue is purely lexicalized; the data type for a lexical entry is a triple of a phonetic form, a CCG syntactic category, and a DTS preterm, so a semantics developer has direct access to each lexical entry. The lightblue lexicon has about 994,416 lexical entries for open words obtained from the dictionary of JUMAN, a part-of-speech tagger and morphological analyzer (Morita et al. 2015), which is automatically obtained (and distilled) from the world wide web, plus 758 lexical entries for closed words excerpted from Bekki (2010).

The CCG part of lightblue can make use of category variables, and the DTS part can make use of variable vectors, which significantly reduces the number of items in lexicons. Lightblue can also use empty categories, which gives it more flexibility, but that will be discussed elsewhere, and not in this paper.

2 DTS with Variable Vectors in de Bruijn Notation

As a semantic theory, lightblue employs DTS in de Bruijn notation, in order to avoid variable name clash or, alternatively, \(\alpha \) conversion everywhere. The standard implementation of lambda calculus in de Bruijn notation is widely known from Pierce (2005), the techniques of which can be naturally extended to DTS.

2.1 Syntax and Reduction

The syntax of DTS with variable names extended with two constructors for variable vectors is given as followsFootnote 4, where the constructors \(\lambda \overline{x}.M\) and \(M\overline{x}\) give binders and bindees of variable vectors, respectively.

Definition 2

(Syntax of DTS with variable names).

In de Bruijn notation, a variable name is replaced with a non-negative integer i, which is bound by the ith binder that takes scope over it. The syntax is defined as described below, where \(j\in \left\{ 1,2 \right\} \).

Definition 3

(Syntax of DTS in de Bruijn notation).

The syntax of Definition 3 is naturally implemented by the following Haskell data type.

figure h

Semantic representations of (a nominative version of) quantifiers every and some in DTS are given as follows (Bekki and Mineshima 2016):

figure i

With the data type Preterm, these representations are described in Haskell code as follows.

figure j

The definitions of free variables and substitution for Preterm are obtained by extending their standard lambda calculus definitions, from Pierce (2005), with Pi, Sigma, Asp, Lamvec, and Appvec. The form \(\texttt {subst\,\,m\,\,l\,\,i}\) is understood as \(\texttt {m[l/i]}\), that is, the preterm m in which an index i is substituted by a preterm l.

figure k

The essence of the definition of subst lies in the use of the shiftIndices function. The form shiftIndices m d i appears in the cases of Pi, Lam, Sigma, and Lamvec above and executes d-place shift (Pierce 2005). Namely, it adds d to every index within m that is greater than i, whose role is to accommodate all indices to the new environment in which m is placed. The shiftIndices function is recursively defined as follows.

figure l

For example, \({\texttt {(}\lambda \texttt {M)[L/i]}}\) reduces to \((\lambda \texttt {(M[shiftIndices\;L\;1\;0/i+1])}\), adding 1 to all indices greater than or equal to 0 (i.e., all indices) in L, since L, a preterm to replace, is going though one \(\lambda \).

The reason for the restriction “greater than or equal to 0” is that indices in L that are bound within L must stay intact. For example, L, a preterm to replace, goes through two \(\lambda \)s, then indices less than or equal to i+1 should remain unchanged.

$$\begin{aligned} (\lambda \lambda \texttt {M})[\texttt {L}/i]= & {} \lambda ((\lambda \texttt {M)[shiftIndices\;L\;1\;0/i+1]} \\= & {} \lambda \lambda (\texttt {M[shiftIndices\;(shiftIndices\;L\;i\;0)\;1 0/i+2]} \end{aligned}$$

Note that the definitions of Pi and Sigma add 1 only in the nuclear scope part (not in the restriction part). This reflects that in constructions such as \( \left( x{:}A \right) \rightarrow B \) and \( \left[ \begin{array}{l} x{:}A \\ B \end{array} \right] \), A is outside the scope of x.

The above definition of subst is then used to define beta reduction of preterms as follows, which also uses the shiftIndice function.

figure m

2.2 Expanding and Shrinking of Variable Vectors

The expanding and shrinking operations of variable vectors consist of the three primitive functions addLambda, deleteLambda, and replaceLambda.

addLambda i m works on the form Appvec j f (within m), which is bound by the ith binder from the position of the addLambda i m (= the jth binder from the position of Appvec j f), and replaces it with Appvec j (App (addLambda i m) (Var (j+1))). In the second choice in Definition 1, this operation is used for replacing \(M[\dots f\overline{x} \dots ]\) with \(M[\dots (fx)\overline{x} \dots ]\) when we replace \(\lambda {\overline{x}}\) with \(\lambda x.\lambda {\overline{x}}\)Footnote 5.

figure n

deleteLambda i m works on the form Appvec j f (within m, under the same conditions as addLambda), and replaces it with deleteLambda i m. In other words, deleteLambda deletes the occurrence(s) of a corresponding variable vector from m.

figure o

replaceLambda i m works on the form Appvec j f (within m, under the same conditions as addLambda and deleteLambda), and replaces it with App (replaceLambda i m) (Var j). Namely, it replaces the occurrence(s) of a corresponding variable vector with a variable.

figure p

Using these three functions gives the following definition of variable vectors.

$$\begin{aligned} \lambda \overline{x}.M[\dots f\overline{x} \dots ]&\mathop {\equiv }\limits ^{ def }&\left\{ \begin{array}{l} M[\dots f \dots ] \\ \lambda x.\lambda \overline{x}.M[\dots (fx)\overline{x}\dots ] \end{array} \right. \end{aligned}$$

can be represented by the following (pseudo-)Haskell code:

$$\begin{aligned} \texttt {Lamvec M}[\dots \texttt {Appvec j f} \dots ]&\mathop {\equiv }\limits ^{ def }&\left\{ \begin{array}{l} \texttt {deleteLambda 0 M} \\ \texttt {Lam (Lamvec (addLambda 0 M))} \end{array} \right. \end{aligned}$$

Here, \(f\overline{x}\) in \(M[\dots f\overline{x} \dots ]\) is replaced with f by deleteLambda 0 M, and is replaced with \((fx)\overline{x}\) by addLambda 0 M. Note that a condition that \(x\not \in fv \left( M[\dots f\overline{x}\dots ]\right) \) in Definition 1 is no longer necessary under de Bruijn notation.

2.3 Interaction Between Category and Lambda Terms

The remaining task is to provide a function that takes a pair comprising a syntactic category and a preterm in DTS and returns a preterm within which variable vectors are expanded or shrunk as needed. The transvec function, defined as follows, does this job, transforming variable vectors within a given preterm by adjusting the preterm’s number of arguments using the tree functions of the last section.

figure q

By using transvec functions, we can implement a CCG parser with category variables in the syntax and variable vectors in the semantics.

3 Some Examples

Let us demonstrate some parse results of lightblue. The first example is a Japanese verb phrase consisting of a quantifier (in the object position) and a transitive verb.

figure r

The following derivation is an output of lightblue given the phrase subete-no ningen-ni (except for a minor modification that replaces names of constant symbols in Japanese with equivalents in English).

figure s
figure t

In the above derivation, \({\varvec{T}}\) is a category variable. The upper scripts of \({\varvec{T}}\), such as , indicate that structure sharing is taking place, and each of their scopes is local. Namely, within the phrase subete-no and within the phrase ni do not share their structures with each other. The lower script of \({\varvec{T}}\) shows the “final-output category” of \({\varvec{T}}\). For example, \({\varvec{T}}_{S}\) must be unified with a function category that ends as S, or S itself (\({\varvec{T}}_{S}\) is equal to \(S|\$\) in the standard CCG notation).

The lower script attached to syntactic categories other than \({\varvec{T}}\) shows their syntactic features. For example, is an NP with ni-feature (i.e., an NP marked with the dative case), and is an NP with no case. A feature such as v : 5 : k shows that it is a verb that belongs to the conjugation series 5k. The symbol | is a disjunction (or a union) between syntactic features so that unification between syntactic categories returns their intersection if it is not empty. Features such as neg|cont show conjugation forms (i.e., the negation form or the continuous form). Binary features \(\pm t,\pm p,\pm n\) represent past/nonpast, polite/non-polite, and negated/non-negated, respectively. Details of these syntactic features in Japanese are described in Bekki (2010). The number on the right-hand side of a lexical item corresponds to an entry number in Bekki (2010).

The notation of DTS that lightblue adopts follows that of Bekki and Mineshima (2016). Although the internal representations of DTS preterms are in de Bruijn notation, they are transformed into DTS preterms with variable names when visualized. The point of the above derivation lies in the last step, where function application takes place: the left node is every man, a quantifier NP. How many arguments it would take after merging with a predicate is underspecified by \({\varvec{T}}\), and thus its semantic representation contains a variable vector. Meanwhile, ni is a dative case marker (which is semantically various) whose whole syntactic category unifies with the part of the quantifier NP. As a result, beta reduction between their semantic representations ends up with the following preterm.

This is a Lamvec construction, and the number of arguments that it would take is underdetermined. However, since its corresponding syntactic category (which could be read off from the derivation) is , we can tell that it takes at least one more argument (of syntactic category ) and then would take an arbitrary number of arguments. Thus, the variable vector should be expanded one step and become \(\lambda x.\lambda {\overline{x}_1}.\dots \), and the inner \(\overline{x}_1\) should be replaced with the sequence \((x_0,\overline{x}_1)\). The transvec function ensures this behavior. First, the form of syntactic category and the preterm match the following line of transvec.

figure u

Here, m corresponds to the following subterm.

figure v

for which addLambda 0 m is evaluated, adding a variable \(x_0\) at the position of \(\overline{x}\).

figure w

This beta-reduces to the following preterm.

figure x

In the case where this form is again a Lamvec construction, the transvec function recursively applies the transvec function to it, making reference to the syntactic category \({\varvec{T}}\), which does not do anything for this case. After that, one more lambda operator is added to the top (which binds \(x_0\)), and the whole result is wound up with the following form.

figure y

This subterm will then merge with the transitive verb aw-u (“meet”). The number of arguments of aw-u is two (a nominative NP and a dative NP). Thus, the final syntactic category becomes , as shown below.

figure z

The point here is that on the left-hand side unifies with on the right-hand side. Therefore, \({\varvec{T}}\), the result of this merge operation, must be \(S\backslash NP {ga}\), with two more arguments (one for and another for a continuation). This tells the transvec function to transform \(\lambda \overline{x_1}\) in the semantic representation into two \(\lambda \)s, by the replacelambda function. First, \(\beta \)-reduction proceeds as follows:

figure aa

Then, a pair and the above preterm match the following line of the transvec function, in the same way as the previous example.

figure ab

Applying addLambda 0 and putting Lamvec on top yields the following preterm.

figure ac

This is sent to the recursive call of transvec, making reference to the corresponding syntactic category S. This time, it matches with the following line of transvec.

figure ad

Application of the replaceLambda function replaces the variable vector \(\overline{x}_1\) with a variable \(x_1\), and putting another \(\lambda \) on top yields the following preterm, as expected.

figure ae

4 Conclusion and Future Work

While the use of category variables in CCG offers advantages such as packing ambiguous syntactic candidates during parsing, we have also seen that it requires semantic language to be extended with variable vectors, whose formalization and implementation have not been straightforward so far.

In this article, we introduced lightblue, a Japanese CCG parser implementation equipped with variable vectors and a mechanism for expanding and shrinking them, according to the corresponding syntactic categories. Since the semantic language of lightblue is DTS, which is based on dependent type theory, a natural extension of simply typed lambda calculus, its implementation by a functional programming language is straightforward and natural. The implementation consists of three primitive functions, addLambda, deleteLambda, and replaceLambda, that expand and shrink variable vectors, together with the transvec function that transforms a preterm, making reference to a corresponding syntactic category, by choosing and applying an appropriate definition in Definition 1 according to the number of arguments that the category anticipates.

Although the formalism and implementation of variable vectors in lightblue provides a solution to the problems that we addressed, some issues remain to be pursued. For example, future research should prove a version of completeness of the transvec function regarding how the normal form is defined for typed lambda calculus or dependent type theory with variable vectors. In each step of applying combinatory rules, does the transvec function always transform a given preterm to a normal form? Does the transvec function terminate, and under what conditions? These remain as open issues, which we believe provides an attractive research topic regarding syntactic–semantic transparency in combinatory (or other) categorial grammar(s).