1 Agent-Based Ontology

When two agents communicate with each other by means of a natural language, the speaker uses its external action interface to produce a sequence of language surfaces while the hearer uses its external recognition interface to identify the elements of the sequence. The sequence is time-linear in the sense that it is linear like time and in the direction of time. In accordance with the Western writing convention, the progression of time is shown in the direction from left to right.

1.1 Physical Framework of Communication

figure a

The recognition and action interfaces of the agents are indicated by half circles marked with r and a. The language surfaces are represented by boxes containing s1, s2, s3, \(\ldots \). As agent-external modality-dependent sound waves(speech), dots on paper (writing), or gestures (signing), the surfaces may be measured and described with the methods of the natural sciences, but have no meaning and no grammatical properties whatsoever.

The first surface leaving the speaker is the first to reach the hearer. The last surface leaving the speaker is the last to reach the hearer.Footnote 1 All other aspects of language communication are agent-internal, modalityFootnote 2-independent, and cognitive.

Modality-independence may be illustrated by the basic operations of arithmetic, i.e., addition, subtraction, multiplication, and division. They exist at a level of abstraction which may be realized equivalently as the operations (i) of a human, (ii) a mechanical calculator, or (iii) an electronic computer.Footnote 3

With autonomous robots still absent in today’s computational linguistics, the external framework Sect. 1.1 may be simulated, using the keyboard and the screen of standard computers as primitive recognition and action components. This, however, works only for the transfer of surfaces. It does not work for nonlanguage recognition and action, which are required for a cognitive reconstruction of reference. For example, the agent’s ability to refer to agent-external items is needed for fulfilling a request like Pick up the blue square! or to report how many blue squares there are in the agent’s current task environment.

2 Elementary Concepts

The minimum in reconstructing higher-level cognition is (i) an agent-internal memory, (ii) a central control embedded into and interacting with memory, (iii) a mapping from the recognition interface to central control, and (iv) a mapping from central control to the action interface. The mappings between modality-dependent raw data and modality-independent concepts are formally based on the type-token distinction, familiar from philosophy.Footnote 4 The type of a concept describes the necessary properties, while an associated token is an instantiation with certain additional accidental Footnote 5 properties. As an example consider the recognition of colors (Hausser 1989, p. 296 ff). In physics, they are defined as intervals on the one-dimensional scales of electromagnetic wave length and frequency. Accordingly, the type and a token of the color blue may shown as follows.

2.1 Type and Token of the Color Called blue

figure b

The type specifies the wavelength and the frequency of the color blue by means of variables which are restricted to the corresponding intervals provided by physics. The token uses constants which lie within these intervals.

In the recognition of colors, the type provided by memory and the raw input data provided by a sensor interact as follows, resulting in a classified token.

2.2 Concept Type and Token in Color Recognition

figure c

A sensor measures the wavelength 470 nm and frequency 640 THz in an agent-external object. These values lie within the intervals 490–450 nm and 610–6700 THz of the color blue and thus match the type. In the instantiating token, the wavelength and frequency intervals of the type are replaced by the measured values. The feature structures representing types and tokens may be extended as needed, for example, with an additional attribute for color intensity.

Next consider the type and the token of a two-dimensional geometric object.

2.3 Concept Type and Token of the Concept square

figure d

Here, the type and the token share attributes which specify (i) the number of equally long edges and (ii) the angle of their intersections. The type and the token differ only in their edge lengths. The latter is accidental in that the type matches an infinite number of square tokens with different edge lengths.Footnote 6

In analogy to Sect. 2.2, recognition of a square may be shown as follows.

2.4 Type and Token in Recognizing a Square

figure e

The type matches the outline of all kinds of different squares, whereby its variables are instantiated in the resulting tokens.

Today, there exist pattern recognition programs which are already quite good at recognizing geometric objects.Footnote 7 They differ from our approach in that they are based almost completely on statistics. However, even if the terms of the type and the token may not be found in their theoretical descriptions, the type-token distinction is nevertheless implicit in any pattern recognition processing. Furthermore, the rule-based, incremental proceduresFootnote 8 of pattern recognition presented in Hausser (2005) are well-suited to be combined with statistical methods.Footnote 9

The elementary concepts of nonlanguage recognition are complemented by those of action. For example, the concept take is defined as the type of a gripping action which is instantiated as a token to be realized as raw data. The token differs from the type in that it is adapted to a specific gripping occasion. It holds in general for recognition that raw data are classified by a type and instantiated as a token, while in action a type is specialized into a token which is passed to a suitable action component for realization as raw data (Hausser 1999, 3.3.5).

The interaction between the agent’s external interfaces, the types, the tokens, and the memory must be hand in glove. For example, if the agent has no sensor for measuring electromagnetic wavelength/frequency, colors cannot be recognized – even if the proper types were available from memory. Conversely, without the types the raw data provided by a suitable sensor cannot be classified and instantiated as tokens. Also, without a memory the types cannot be provided for recognition and action, and the tokens cannot be stored.

3 Data Structure and Database Schema

The concepts defined in Sects. 2.1 and 2.3 constitute elementary cognitive contents, but they do not provide any means for being connected, as in blue_square. For this, DBS lexically embeds the concepts as core values into nonrecursiveFootnote 10 feature structures with ordered attributes, called proplets (because they are the elementary building blocks of propositions, in analogy to droplet). A feature structure is built from features. In computer science, a feature is defined as an attribute-value pair (avp), e.g. [noun: square], with noun: as the attribute and square as the value.

The embedding of core values into proplets allows their concatenation by means of value copying. For example, the proplets blue and square may be connected into the content of blue_square as follows.Footnote 11

3.1 Concatenation by Cross-Copying

figure f

The nature of the semantic relation between blue and square is characterized by the attributes mdr (modifier) and mdd (modified). The relation is implemented by copying the core value of square into the mdd slot of blue and the core value of blue into the mdr slot square. In addition, the prn value of blue, here 17, is copied into the prn slot of the next word proplet square.

Next consider extending Sect. 3.1 to an intrapropositional coordination.

3.2 Coordination in big blue square

figure g

The relation of intrapropositional coordination is coded by the nc (next conjunct) and pc (previous conjunct) attributes of the conjoined adjectives.

The diagonal lines in Sect. 3.2 are intended as optical support for the reader. Technically, however, they are redundant and may be omitted. The real method of establishing semantic relations in DBS is by addresses coded declaratively as values and implemented procedurally as pointers. This method makes the proplets forming a complex content order-free, allowing the database to store them independently of the semantic relations between them.

For example, no matter where the storage mechanism of the database puts the adnominal big, its modified may be found via the primary key consisting of the mdd value square and the prn value 17. Similarly, no matter where the noun square is stored, its modifier may be found via the mdr value big and the prn value 17. And accordingly for the intrapropositional coordination in Sect. 3.2.

As another example consider the content of Julia knows John., represented as the following set of connected proplets.

3.3 Content of Julia Knows John. As a Set of Proplets

figure h

The simplified proplets are held together by a common prn value, here 625. The functor-argument is coded solely in terms of attribute values. For example, the Julia and John proplets specify their functor as know, while the know proplet specifies Julia and John as its arguments. Because of their nonempty sur(face) slots, the proplets are language proplets, in contradistinction to the proplets in Sects. 3.1 and 3.2, which are context proplets.

For storage and retrieval, a proplet is specified uniquelyFootnote 12 by its core and prn values (primary key). This suggests a two-dimensional database schema, as in a classic network database (Elmasri and Navathe 2010). However, instead of using member and owner records, DBS uses member proplets and owner values.

The result is called a word bank. Its database schema consists of a column of owner values in their alphabetical order (vertical). Each owner value is preceded by an empty slot, called the now front, and a list of member proplets (horizontal); together they constitute a token line.Footnote 13

As an example, consider storing a nonlanguage content.

3.4 Storing the Proplets of Sect. 3.3 in a Word Bank

  

figure i

The proplets in a token line all have the same core value and are in the temporal order of arrival, reflected by their prn values (Hausser 2006, Sects. 11.2, 11.3).

In contrast to the task of designing a practical schema for arranging the books in a private library, the sorting of proplets into a word bank is simple and mechanical. The letter sequence of a proplet’s core value completely determines its token line for storage: the storage location for any new arrival is the penultimate position (now front) in the corresponding token line. When this slot is filled, the now front is reopened by moving the owner value one slot to the right (or, equivalently, pushing the member proplets one slot to the left, as in a push-down automaton).

By storing content like sediment, the stored data are never modified and any need for checking consistency is obviated. Changes of fact are written to the now front, like diary entries recording changes of temperature. Current data which refer to old ones use addresses as core values, implemented as pointers.

4 Cycle of Natural Language Communication

The transfer mechanism of content from the speaker to the hearer is based on external surfaces which have neither a meaning nor any grammatical properties (Sect. 1.1). They must, however, belong to a language which the speaker and the hearer have each learned.

The learning enables the hearer to (i) recognize surfaces, (ii) use the recognized but otherwise unanalyzed surfaces for looking up lexical entries which provide the meaning and the grammatical properties, and (iii) connect them with the semantic relations of functor-argument and coordination. The learning enables the speaker to (i) navigate along the semantic relations between proplets, (ii) produce language-dependent word form surfaces from the core values of proplets traversed, and (iii) handle function wordFootnote 14 precipitation, micro word order, and agreement.

4.1 Definition of Successful Communication

Natural language communication is successful if the content, mapped by the speaker into a sequence of external word form surfaces, is reconstructed and stored equivalently by the hearer.

The transfer of information from the speaker to the hearer, based solely (i) on unanalyzed external surfaces, (ii) the data structure of proplets, (iii) the database schema of a word bank, and (iv) the content Sect. 3.4, may be shown schematically as follows.

4.2 Natural Language Transfer Mechanism

figure j

The speaker’s navigation through a set of connected proplets serves as the conceptualization (what to say) and as the basic serialization (how to say it) of natural language production (McKeown 1985; Kass and Finin 1988). The hearer’s interpretation consists in deriving a corresponding set of proplets, based on automatic word form recognition and syntactic-semantic parsing. The time-linear order of the sign induced by the speaker’s navigation is eliminated in the hear mode, allowing storage of the proplets in accordance with the database schema of the content-addressableFootnote 15 word bank. When the agent switches into the speak mode, order is reintroduced by navigating along the semantic relations between the proplets.

5 Conceptual Reconstruction of Reference

In DBS, a cognitive content is defined as a set of proplets connected by address. Proplets with a non-empty sur(face) slot (Sect. 3.3) represent a language content. Proplets with an empty sur slot (Sect. 3.1) represent a context content. Otherwise, language and context proplets are alike. This holds specifically for their storage and retrieval in a word bank, which is based solely on their core value and order of arrival.

Conceptually, however, reference may be modeled by (i) separating the levels of language and context, (ii) introducing the place of pragmatics as an interaction between the two levels, and (iii) distinguishing peripheral and central cognition.

5.1 Conceptual View of Interfaces and Components

figure k

Externally, the agent’s interfaces for language and nonlanguage recognition are the same, as are those for language and nonlanguage action.Footnote 16 Internally, however, raw input data are separated by peripheral cognition into language and nonlanguage content (diagonal input arrows). Conversely in action, which realizes a content as raw output data regardless of whether it originated at the language or at the context level (diagonal output arrows).

For example, as a sound pattern the surface blue square will have a meaningful interpretation at the language level by someone who has learned English, but be treated as an uninterpreted noise at the context level by someone who has not. Conversely, even though the action of denying entrance may be realized by telling to go away (originating at the language level) or by slamming the door (originating at the context level), both result in raw output data.

The distinction between the language and the context component provides a cognitive treatment of reference. Reference to an object in the agent’s current environment is called immediate reference, while reference to cognitive content existing only in the agents’ memory, for example, J.S. Bach, is called mediated reference. For mediated reference, the agent-based ontology of DBS (Sect. 1) is essential.

As an example of immediate reference consider a speaker and a hearer in a common task environment (Newell and Simon 1972) and looking at a blue square. If the speaker says Take the blue square, the noun phrase refers to the object in question. Similarly for the hearer, for whom fulfilling the request requires reference to the same object.

Postulating an external relation between a surface and its referent would be a reification fallacy. Instead we reconstruct immediate reference cognitively.

5.2 Immediate Reference as a Purely Cognitive Procedure

figure l

Immediate reference relies on the agents’ action and recognition interfaces for language (upper level) and the recognition of nonlanguage content (lower level). Mediated reference, in contrast, relies on language action and recognition (upper level) and the existence of corresponding content in the agent’s memory. While immediate reference may be regarded as prototypical for the origin of language, it is a special case of mediated reference in that it has the additional requirement of context recognition (Hausser 2006, Sect. 2.5).

Terminological Remark

Computer Science uses the term “reference” differently from philosophy and linguistics. A computational reference is an address in a storage location. This may be coded as (i) a symbolic address (declarative) or as (ii) a pointer to a physical storage location in the memory hardware (procedural). The term “generalized reference” is used in image reconstruction (computer vision).

In DBS, the term “reference” is used in the sense of philosophy and linguistics. However, the term is generalized insofar as no agent-external “representational token” is required (Sect. 6.3, constellations 1 and 3). Recanati (1997), Pelczar and Rainsbury (1998), and others use “generalized reference” for an analysis of the sign kind name which allows the surface Mary, for example, to refer to several individuals, in contradistinction to Russell’s (1905) definite description analysis of “proper” names, which requires a unique referent.

The DBS analysis of names also allows different referents (Sect. 8.4). However, while the “generalized reference” of Recanati, Pelczar et al., and others is based on assimilating names to indexicals, the DBS analysis is based on an act of baptism which is generalized in that it may occur implicitly as well as explicitly. Moreover, generalized reference in DBS is not limited to names, but includes reference by means of matching concepts (symbol) and pointing (indexical).

6 Reference by Matching (Symbol)

The reference mechanism based on matching uses the type-token relation (Sects. 2.1, 2.3) and is associated with the sign kind symbol. For example, the terms a blue square and blue squares in the sentence sequence John saw a blue square. ... Blue squares are rare. are related as follows.

6.1 Reference with Language Proplets in Token Lines

figure m

The vertical relation between the language and the context component shown in Sect. 5.1 reappears as a horizontal relation between proplets within token lines. The language proplets with the prn value 48 have non-empty sur slots, while sur slots of the context with the prn value 41 are empty. Reference by matching holds between the two blue proplets with the prn values 41 and 48 and similarly between the two square proplets. The distinction between the type and the token, here indicated after the core values, is usually left implicit.

The combination of the proplets blue and square by means of a functor-argument relation is coded by the features [mdd: square] and [mdr: blue], respectively. The noun proplet with the feature [sem: indef sg] is an indefinite singular, that with the feature [sem: indef pl] is an indefinite plural.

Next consider the same reference relation without language.

6.2 Reference by Matching Without Language

figure n

Here the reference relation holds between two nonlanguage contents – and not between a language content (meaning\(_{1}\)) and a nonlanguage content, as in Sect. 6.1.

Even though the reference relation is established between two individual proplet pairs in the same token lines, the combination into the complex content corresponding to blue square is accommodated as wellFootnote 17: in order to match, the two blue proplets must not only have the sameFootnote 18 core value, but also the same mdd continuation value, here square, and correspondingly for the mdr values of the two square proplets. Their fnc and prn values, however, are different.

Generalizing reference by matching to include referring with nonlanguage content results in the following constellations.

6.3 Constellations of Generalized Reference

  1. 1.

    Nonlanguage content referring to nonlanguage content

    Example: Agent identifies something seen with something seen before.

  2. 2.

    Language content referring to nonlanguage content

    Example: Agent describes a landscape in speak mode.

  3. 3.

    Nonlanguage content referring to language content

    Example: Agent identifies a current nonlanguage recognition with something it has read (for example, in a guide book) or heard about before.

  4. 4.

    Language content referring to language content

    Example: Agent describes what it has heard or read.

Cognitive agents without language are capable of reference constellation 1 only, while agents with language may use all four.

7 Reference by Pointing (Indexical)

The second reference mechanism of cognition is based on pointing. In natural language, it is illustrated by the indexical signs, such as the pronouns. The first step toward a computational implementation is the linguistic observation that the indexicals point at only five different parameters, namely (1) first person, (2) second person, (3) third person, (4) place, and (5) time.

In English, the pronouns I, me, mine, we, and us point at the parameter for first person, you points at the parameter for second person, and he, him, his, she, her, it, they, them point at the parameter for third person. The indexical adjs here and there point at the parameter for place. The indexical adjs now, yesterday, and tomorrow point at the parameter for time.

The indexical nouns pointing at the parameters of first, second, and third person are varied by grammatical distinctions. Consider the following examples illustrating grammatical variation in 1st person pronouns of English.

7.1 1st Person Pronouns Distinctions

figure o

The proplets all share the indexical pointer pro1 as their core value. The different cat values s1 (first person singular), p1 (first person plural), and obq (oblique) control verb agreement, preventing, for example, *Me saw a tree or *Peter saw we. *I sees a tree and *he see a tree are prevented by using the different cat values s1 (singular 1st person) and s3 (singular 3rd person).

Indexical nouns combine in the same way into propositions as proplets of the sign kind symbol or name. Consider the DBS analysis of English I heard you..

7.2 Representing I heard you. As a Language Content

figure p

The question raised by this example is how the indexical pointers pro1 and pro2 are to be interpreted pragmatically relative to a context of use.

This leads to the second step of modeling the indexical reference mechanism. It is based on combining a propositional content with a cluster of parameter values of the agent’s current STAR (Hausser 1999 Sect. 5.3). The STAR is an acronym for (i) location (Space), (ii) time (Time), (iii) self-identity (Agent), and (iv) intended addressee (Recipient).

The STAR has two functions: (a) keeping track of the agent’s current situation (orientation) and (b) providing referents for indexicals occurring in contents.Footnote 19 A STAR is coded as a proplet, with the A value serving as the core value and as the owner. In a word bank, a temporal sequence of STARs records the output of the agent’s on-board orientation system and is listed as a token line.Footnote 20

7.3 Token Line Example of STARs Defined as Proplets

figure q

In addition to attributes represented by the letters of the STAR, there is a fifth, called 3rd, for third person indexicals. Though not required for the agent’s basic orientation, 3rd is needed to provide the referent for items which are neither 1st nor 2nd person.Footnote 21 As indicated by the prn values, e.g. [prn: 63–70], several consecutive propositions may share the same STAR.

In communication, three perspectives on content must be distinguished (Hausser 2011, Chaps. 10, 11). The STAR-0 is the agent’s perspective onto its current environment; it need not involve language. The STAR-1 is the agent’s speak mode perspective onto stored content as required for language production; if ongoing events are reported directly, the STAR-1 equals the STAR-0. The STAR-2 is the agent’s hear mode perspective onto language content as needed for the correct interpretation of indexicals. As an example of a STAR-0 perspective, consider the non-language content corresponding to I hear you.

7.4 Anchoring a Content to a STAR-0

figure r

This content differs from Sect. 7.2 because (i) it is nonlanguage (no sur values), (ii) the sem value of the verb is pres (present tense) rather than past, and (iii) a STAR is attached by having the same prn value as the content, here 63.

The STAR-0 shows the perspective of the agent Sylvester on his current environment. The S value specifies the location as the kitchen, the pres value of the verb points at the T value, the indexical pro1 points at the A value Sylvester the cat, and pro2 points at the R value Speedy the mouse.

Next Sylvester realizes the content in language by saying to Speedy I heard you. As time has moved, the language content Sect. 7.2 is anchored to a second, later STAR-0 with the prn value 71 (Sect. 7.3). From these two STAR-0, the agent computes the following STAR-1 perspective for the language content Sect. 7.2.

7.5 Speak Mode Anchoring to a STAR-1

figure s

The agent’s perspective is looking from his present situation back on the stored content Sect. 7.4 and encoding it in language. The content is connected to the agent’s current STAR via the common prn value 64. The content differs from that of Sect. 7.4 in (i) the sem value past (rather than pres) of the verb proplet and (ii) the language-dependent sur values of the content proplets.

When the language content I heard you is interpreted by the addressee (recipient), Speedy the mouse uses the content of the language sign and its current STAR-0 to derive the STAR-2 perspective. The result is as follows.

7.6 STAR-2 Perspective in Hear Mode

figure t

Speedy as the hearer uses his personal prn value and a different STAR: compared to the STAR of Sylvester, the A and R values are reversed and Sylvester’s I heard you is reinterpreted by Speedy’s STAR-2 perspective as you heard me.

8 Reference by Generalized Baptism (Name)

In DBS, the reference mechanism of the sign kind name is also implemented as a cognitive operation. It consists of (i) establishing object permanence Footnote 22 and (ii) generalized baptism based on cross-copying between a name and its referent.

Object permanence is implemented as identity by address. It is coded by using an address as the core value of the non-initial proplets, pointing at the proplet representing the initial appearance of the referent.

8.1 Object Permanence by Using Address

figure u

The different prn values indicate that each member proplet is part of a different proposition, allowing different continuation values. The core values (dog 83) of the non-initial member proplets point at the initial proplet, which is the referent and formally recognizable by its non-address core value.

A token line like Sect. 8.1 may contain several initial dog referents, each referring to another individual. They are distinguished by their different prn values and the address numbers of the associated coreferent proplets. This is sufficient for the agent to properly discriminate between different dog referents in cognition and between their sets of coreferent proplets, all in the same token line.

It is not sufficient, however, for language communication. This is because the prn values of referents are not synchronized between agent. What is needed is a name surface and an interagent consensus on which item(s) the name refers to. The consensus is simply achieved: the not yet initiated agent follows the practice observed because communication would break down otherwise.

The DBS implementation is based on (i) a lexical name proplet which has a sur(face) value but no core value and (ii) a referent proplet which has a core value but no sur value. The two proplets are supplemented by an event of generalized baptism which cross-copies the sur value of the name into the sur slot of the referent and the core value of the referent into core slot of the name.

8.2 Baptism as Cross-Copying

figure v

The named referent proplet is stored in the token line of the core value and used in the speak mode. The supplemented name proplet is stored in the token line of the surface and used in the hear mode.

The baptizing event is formalized as the following DBS inference.

8.3 Applying the Formal Baptizing Inference

figure w

The third content proplet is the named referent, the fourth the supplemented name.

Consider the following word bank containing three referents named Mary, referring to the grandmother, the mother, and the daughter in a family. The token lines are in the alphabetical order daughter, grandmother, Mary, mother.

8.4 Name Referring with Multiple Referents

figure x

The member proplets show the result of three baptism inferences like Sect. 8.3. In the token line of Mary, each supplemented name proplet occurs only once.

Supplemented names are not written into the lexicon because a core value like (daughter 21) is not a convention of the natural language at hand. Instead it is the result of a generalized baptism: for applying the inference Sect. 8.3,Footnote 23 it is sufficient for the uninitiated agent to witness the use of a name. The supplemented name proplets in the Mary token line have a lexical quality insofar, however, as they have neither continuation nor prn values – like the lexical proplets resulting from automatic word form recognition.Footnote 24

When the hearer interprets a sentence containing a name, the name activates the corresponding token line, here that of Mary. The choice between different referents, here the grandmother, the mother, and the daughter, may have one of the following results: (i) the chosen referent equals the one intended by the speaker (correct choice), (ii) does not equal the one intended by the speaker (incorrect choice), or (iii) no referent is chosen (inconclusive result). The choice between multiple potential name referents is usually not at random, however. Instead, the referent most suitable to the utterance situation will usually be the correct one. If uncertainty remains, the hearer may ask for clarification.

For an agent in the speak mode, there is no ambiguity.Footnote 25 Instead, the speaker selects the intended referent, e.g. (daughter 21). If the agent acquired the appropriate name in the hear mode (Sect. 8.3), it is preserved in the word bank and may be used in the speak mode. If the agent is in the position to select and bestow a name, it is also available for realization.

9 Reference by Address (Coreference)

Coreference by address occurs with all three sign kinds. In name-based reference it is the only mechanism for relating the supplemented name to the named referent (Sect. 8.2). In reference by matching (symbol) and by pointing (indexical), in contrast, it is an additional method.