1 Introduction

The DialogBankFootnote 1 is a new language resource, developed at Tilburg University, which contains dialogues of various kind with gold standard dialogue act annotations according to the ISO 24617-2 standard.Footnote 2 This standard builds on previously designed annotation schemes such as DAMSL, DIT\(^{++}\), MRDA, HCRC Map Task, Verbmobil, SWBD-DAMSL, and DIT.Footnote 3 Most of these schemes have been used to construct annotated corpora, such as the Switchboard, HCRC Map Task, ICSI-MRDA, and DIAMOND corpora.

For nearly all of these annotation schemes, dialogue act annotation consists of segmenting a dialogue into certain grammatical units and marking up each unit with one or more communicative function labels. ISO 24617-2 supports semantically more complete annotation by additionally annotating the following aspects (considered in more detail in Sect. 2):

  1. 1.

    ’Dimension’: the annotation scheme supports multidimensional annotation, i.e. multiple communicative functions may be assigned to dialogue segments. Different from DAMSL and other multidimensional schemes, an explicitly defined notion of ‘dimension’ is used that corresponds to a certain category of semantic content. The ISO scheme distinguishes nine dimensions on empirical and theoretical grounds.

  2. 2.

    ‘Qualifiers’ may be added for expressing that a dialogue act is performed conditionally, with uncertainty, or with a particular sentiment.

  3. 3.

    Dependence relations are defined for expressing semantic relations between dialogue acts, e.g. for indicating which question is answered by a certain answer act (functional dependence relation), or which utterance a feedback act responds to (feedback dependence relation).

  4. 4.

    Rhetorical relations may be annotated to indicate e.g. that one dialogue act contains the motivation for performing another dialogue act.

Most of the dialogues in the DialogBank have been taken from existing corpora and have been re-segmented and re-annotated; some of these also have their original annotations for comparison; this includes dialogues that were previously annotated according to the DIT\(^{++}\) annotation scheme, which has been a major source of inspiration for the ISO 24617-2 standard.

The DialogBank presently contains (re-)annotated dialogues from four English-language corpora: HCRC Map Task (Anderson et al. 1991), Switchboard (Jurafsky et al. 1997), TRAINS (Allen et al. 1994) and DBOX (Petukhova et al. 2014); and from four Dutch-language corpora: DIAMOND (Geertzen et al. 2004), Schiphol (Prüst et al. 1984), OVIS (www.let.rug.nl/vannoord/Ovis), and the Dutch Map Task corpus (http://doc.ukdataservice.ac.uk/doc/4632/mrdoc/pdf/4632userguide.pdf; Caspers 2000a, b). Dialogues from other corpora, such as the multi-party AMI corpus (http://groups.inf.ed.ac.uk/ami/corpus/), the Monroe corpus (Stent 2000), and the MIB corpus (Petukhova et al. 2016), and in other languages, such as Vietnamese (see Ngo et al. 2018), are planned to be added in the near future.

This paper is organized as follows. Section 2 briefly discusses the use of the ISO 24617-2 standard for the interoperable annotation of dialogue act information. Section 3 discusses the re-annotation (and re-segmentation) of dialogue data from existing corpora, using the pivot XML format of the DiAML markup language defined in the ISO standard. Section 4 introduces two alternative representation formats for ISO 24617-2 annotations, exploiting the distinction of the abstract and concrete syntax made in the definition of DiAML. The interoperability of the three representation formats is shown and their advantages and disadvantages are discussed. Section 5 is concerned with the limitations of the ISO standard that were brought to light during the re-annotation of existing dialogue data and the construction of mappings between different representations. Section 6, finally, contains conclusions from the experiences in building the DialogBank and indicates directions for future work.

2 Interoperable annotation and the ISO 24617-2 standard

2.1 Annotations and their representation

The main motivation for designing annotation standards is to promote the interoperability of annotated corpora. Interoperability of annotations is partly a matter of interchangeable representation formats, such as XML, but more importantly of the underlying concepts. Different annotations can be interpreted across platforms and frameworks only if they encode the same information, or information that can be interpreted through a well-defined mapping. Interoperability at conceptual and semantic levels is of more fundamental importance than interoperability at the level of representation formats, therefore the design of ISO 24617-2 has focused on the identification and specification of empirically and theoretically well-motivated concepts and precise definitions.

ISO 24617-2 represents a comprehensive, application-independent annotation scheme with well-defined concepts and the markup language DiAML (Dialogue Act Markup Language), designed in accordance with the ISO Linguistic Annotation Framework (LAF)Footnote 4 and the ISO Principles of Semantic Annotation (‘SemAF Principles’).Footnote 5 LAF makes a fundamental distinction between annotation and representation: ‘annotation’ refers to the linguistic information that is added to segments of language data, independent of format; ‘representation’ refers to the rendering of annotations in a particular format.

Following SemAF Principles, this distinction is implemented in the DiAML definition in the form of an abstract syntax that specifies a class of conceptual annotation structures, which are set-theoretical constructs like pairs and triples of concepts, and a concrete syntax that specifies a rendering of these annotation structures in a reference format using XML. This reference format is called DiAML-XML. It uses abbreviated XML-expressions, and it is complete and unambiguous relative to the abstract syntax, i.e. (1) the concrete syntax defines a representation for every structure defined by the abstract syntax; and (2) every expression defined by the concrete syntax represents one and only one structure defined by the abstract syntax. A format with these properties is called ideal. Any ideal representation format can be converted through a meaning-preserving mapping to any other ideal format (see Bunt 2010 for formal definitions and proofs). This is discussed in connection with alternative representations of annotations in the DialogBank in Sect. 4.

The dialogues in the DialogBank have all been (re-)annotated using the DiAML markup language and the DiAML-XML representation format; additionally, they have also been cast in two alternative representation formats, defined in such a way that they are demonstrably ideal (complete and unambiguous) and more convenient for human readers than XML-based representations.

2.2 Main features of ISO 24617-2 annotations

As mentioned in the Introduction, ISO 24617-2 annotations differ from most other existing dialogue act annotation schemes in using semantically well-defined dimensions, qualifiers, and relations among dialogue acts, including functional dependence relations, feedback dependence relations, and rhetorical relations. Each of these features is briefly described here.

Dimensions: Utterances in dialogue often have more than one communicative function, as several authors have observed (Allwood 1992; Bunt 1994, 2011; Popescu-Belis 2005; Traum 2000). The following dialogue fragment illustrates this:

(1)

1. Anne: Henry, can you take us through these slides?

2. Henry: Ehm... sure, just ordering my notes.

In the first utterance, Anne makes a request and assigns the next speaking turn to Henry. In the second utterance, Henry accepts the turn and stalls for time, accepts the request, and explains why he does not fulfill the request right away. The DIT\(^{++}\) annotation scheme was designed to optimally support the annotation of multifunctional utterances (Bunt 2009, 2011). It is based on a well-founded notion of dimension, inspired by the observation that participants in a dialogue perform a range of communicative activities beyond those that relate directly to performing a certain task or activity. They also give and elicit feedback, take turns, stall for time, and demonstrate and monitor attention; moreover, they often perform several of these activities at the same time. The term ‘dimension’ refers to these various types of communicative activity.

The ISO 24617-2 annotation scheme inherits the following nine dimensions from the DIT\(^{++}\) scheme: (1) Task: dialogue acts that move the task or activity forward which motivates the dialogue; (2–3) Feedback, divided into Auto- and Allo-Feedback: acts providing or eliciting information about the processing of previous utterances by the current speaker or by the current addressee, respectively; (4) Turn Management: activities for obtaining, keeping, releasing, or assigning the right to speak; (5) Time Management: acts for managing the use of time in the interaction; (6) Discourse Structuring: dialogue acts dealing with topic management, opening and closing (sub-)dialogues, or otherwise structuring the dialogue; (7–8) Own- and Partner Communication Management: actions by which the sender edits his current contribution or a contribution of another current speaker, respectively; (9) Social Obligations Management: dialogue acts for greeting, thanking, apologizing, and other social conventions in communication.

The ISO 24617-2 inventory of communicative functions contains 56 functions, subdivided into general-purpose and dimension-specific functions. Dimension-specific communicative functions are specific for a particular dimension; for instance Turn Take is specific for Turn Management; Stalling is specific for Time Management, and Self-Correction is specific for Own Communication Management. General-purpose communicative functions, by contrast, can be used in any dimension; for example, “You misunderstood me” is an Inform in the Allo-Feedback dimension, and “Tony, will you take over please” is a Request in the Turn Management dimension. All types of question, statement, and answer can be used in any dimension, and the same is true for commissive and directive functions, such as Offer, Suggest, and Request. Table 1 lists the communicative functions defined in ISO 24617-2.

Table 1 ISO 24617-2 communicative functions

Qualifiers: Three types of qualifiers are included in ISO 24617-2, namely for indicating a speaker’s (un-)certainty, (un-)conditionality, and sentiment. For certainty only two rather coarse-grained qualifiers are defined, certain and uncertain, and likewise for conditionality: conditional and unconditional; (2) and (3) below show examples of these qualifiers.

(2)

B: That’s just the way their minds work

A: the stamina that you must draw from yourself to deal with it ... I guess you find out that you’re a much stronger person than you thought, maybe

(3)

P2: Shall we place these buttons at the bottom?

P3: Only if they have a clearly different shape or colour.

For sentiment the values positive and negative have been used in some dialogue annotations; however, the ISO standard does not specify any particular set of sentiment qualifiers; such values are expected to be provided by ongoing research on sentiment analysis and representation. The different qualifiers are applicable to different classes of dialogue acts. Sentiment qualifiers are applicable to any dialogue act with a general-purpose function; conditionality qualifiers to dialogue acts with a commissive or directive function (Promise, Offer, Suggestion, Request, etc.); and certainty qualifiers are applicable to dialogue acts with an ‘information-providing’ function’ (Inform, Agreement, Disagreement, Correction, Answer, Confirm, Disconfirm).

Functional dependence relations are indispensable for the interpretation of dialogue acts that are responsive in nature, such as Answer, Confirmation, Disagreement, Accept Apology, and Decline Offer. The meaning of these acts depends crucially on the dialogue act that they respond to. Functional dependence relations connect occurrences of such dialogue acts to their ’antecedent’ and correspond to links for marking up a segment not only as having the function of an answer, for example, but also indicating which question is answered.

Note that ISO 24617-2 in its present form does not support the marking up of the semantic content of a dialogue act (but a future revision may be extended in this direction; see Bunt et al. 2017a, 2018b); currently, the only information about the semantic content of a dialogue act is in the marking up of its dimension, which can be viewed as indicating a type of semantic content (e.g., the content of a dialogue act in the Task dimension is task-related information; that in a feedback dimension is processing information; that in the Turn Management dimension is about the allocation of the speaker role, etc.). Dialogue acts have a formal semantics in terms of updating the information states of dialogue participants (see Bunt 2014) which interprets DiAML annotations as functions that, when applied to a semantic content, yield update operations.

Feedback dependence relations play a similar role for interpreting feedback acts as functional dependence relations for responsive dialogue acts; their meaning is partly or entirely determined by the utterance(s) that the feedback refers to. This is obvious for ‘inarticulate’ feedback acts, like “OK” and “Yes”. Feedback acts often refer to the immediately preceding utterance, but can also refer further back and to more than one utterance (Petukhova 2011). The ISO 24617-2 annotation scheme therefore includes links for marking up these dependences; an example is shown in (9b).

Rhetorical relations have been studied mostly for their occurrence in written texts, where they are crucial for a full understanding of the individual sentences, but they also play a role in spoken dialogue where they occur in two different ways, illustrated in the following examples (where the participants talk about remote controls and their design):

(4)

1. A: I can never find them.

2. B: That’s because they don’t have a fixed location.

(5)

1. A: Where would you position the buttons?

2. A: I think that has some impact on many things

In (4) the dialogue acts expressed by A’s and B’s utterances are related by a Cause relation between their respective semantic contents: the content of the second causes the content of the first; in (5), by contrast, the second dialogue act forms a reason for performing the first, so the causal relation is between the first dialogue act and the semantic content of the second, rather than between their respective semantic contents. The two cases of a causal relation are known in the literature as ‘semantic cause’ and ‘pragmatic cause’. A similar distinction can be made for many other discourse relations. The annotation of a rhetorical relation is illustrated in example (9b).

Different from functional and feedback dependences, which are an integral part of dialogue acts with a responsive function and of feedback acts, respectively, rhetorical relations give additional information about the ways in which dialogue acts are semantically or pragmatically related. The ISO 24617-2 standard does not specify any particular set of rhetorical relations, but rather expects such a set to be provided by ongoing research in that area, similar to the case of qualifiers for sentiment or emotion (see e.g. Burkhardt et al. 2017). Since the establishment of the ISO 24617-2 standard in 2012, another ISO standard has been defined concerned with the annotation of semantic rhetorical relations (also called ’discourse relations). This standard, ISO 24617-8 (2016), does not claim to provide a complete annotation scheme for the annotation of rhetorical relations, but rather provides precise, ’standard’ definitions for a number of core discourse relations that are found in many different schemes that have been proposed; the ISO standard is therefore also known as ’DR-Core’ (see Bunt and Prasad 2016).Footnote 6 In building dialogue corpora annotated according to the ISO 24617-2 standard, it has become common practice to use the DR-Core set of relations extended with a few other relations, notably from the Penn Discourse Treebank (PDTB; see Prasad and Bunt 2015). This is considered further in Sect. 5.2.

2.3 Segmentation

According to ISO 24617-2, dialogue acts are expressed by ‘functional segments’ of linguistic or other communicative behaviour, defined as minimal stretches of communicative behaviour that have a communicative function, ‘minimal’ in the sense of not including any material that does not contribute to the expression of that function (or to the specification of the semantic content). Functional segments are mostly shorter than turns, may be discontinuous, may overlap, and may contain parts contributed by different speakers. A segment carrying a feedback function, for instance, frequently overlaps with a segment that carries a task-related function.

The requirement of functional segments to be ‘minimal’ has been added in order for communicative functions to be assigned as accurately as possible to those stretches of behaviour that express one or more dialogue acts. The following example illustrates this:

(6)

Can you tell me what time the train to ehm,... Viareggio leaves?

The speaker interrupts himself while formulating a request for information since he needs a bit of time to produce the name of the destination. The small interrupting segment ehm,... does not contribute to the expression of the request, so according to the minimality condition it does not belong to the functional segment that expresses the request. The utterance in (6) should thus be analysed as consisting of two functional segments: the discontinuous segment Can you tell me what time the train to [ ] Viareggio leaves? expresses a request, and the segment ehm,... expresses a Stalling act. This can be annotated in DiAML as follows, where ‘fs1’ and ‘fs2’ indicate the two functional segments:

(7)

View full size image

Note that in this example the yes-no question of the form Can you tell me... has been interpreted as a conditional request, i.e. as: “Please tell me, if you can,...”.

A functional segment is most often a part of what is contributed by the participant who occupies the speaker role, but it may happen that a dialogue act is spread over multiple turns, as in the following example, where the utterances in turns 6, 8, 11, and 13 together form the functional segment that contains B’s answer to the question in turn 5:

(8)

1. A:

I’ve skied in Colorado, and we usually go to New Mexico because it’s a little cheaper —

2. B:

Ooh,

3. A:

— you know

4. B:

Uh-huh

5. B:

Where in Colorado?

6. A:

I’ve been to Telluride, which is on the West side,

7. B:

Yes

8. A:

and, uh, Copper

9. A:

Copper is kind of my favorite up there

10. B:

Really?

11. A:

Breckennridge —

12. B:

Uh-huh

13. A:

— and Keystone

This example forms a tricky case for segmentation and dialogue act annotation, for although the answer is not complete until turn 13, participant B provides intermediate feedback in the turns 7, 10, and 12, and participant A provides an intermediate assessment of the answer part in turn 8, so these answer parts seem to deserve a dialogue act-like status as well. See also the discussion in Sect. 5.1.

2.4 ISO 24617-2 metamodel

The metamodel, displayed in Fig. 1, shows the classes of concepts that are used in ISO 24617-2 annotations. It indicates that a dialogue act has one sender, one or more addressees, zero or more other participants (such as bystanders or an audience; see Clark (1996)), one dimension, one communicative function, zero or more functional and feedback dependence relations, possibly one or more qualifiers, and possibly one or more rhetorical relations to other dialogue acts.

According to the metamodel, the ingredients that make up an ISO 24617-2 annotation are those listed in Table 2, where the second column indicates the number of each kind of element in an annotation structure.

Table 2 Ingredients of ISO 24617-2 annotations

Of these elements, rhetorical relations strictly speaking fall outside the scope of ISO 24617-2, which has only a minimal provision for allowing to specify a rhetorical relation between dialogue acts but does not specify any particular set of such relations. As mentioned above, it has become common practice by users of the standard to include annotations of rhetorical relations using the DR-Core relations defined in ISO 24617-8 (see Bunt et al. 2017), sometimes with some extensions.

Fig. 1
figure 1

ISO 24617-2 metamodel

2.5 Annotations in DiAML-XML

The representation of annotations in DiAML-XML makes use of two XML elements, one to represent individual dialogue acts and one to represent rhetorical relations between dialogue acts. A <dialogueAct> element has attributes whose values represent the following components, corresponding with the components listed in Table 2:

  • the speaker, the addressee(s), and any other participants (possibly none);

  • the communicative function and the dimension;

  • qualifiers (if any); and

  • functional and feedback dependence relations.

Example (9b) shows the use of these XML elements in the representation of the annotation of the dialogue fragmentFootnote 7 in (9a), which contains a rhetorical relation (Elaboration) between the dialogue acts in utterances 1 and 3, and a feedback dependence between the dialogue acts in utterances 3 and 4.

(9)

a.

1. G:

go south and you’ll pass some cliffs on your right

2. F:

uhm...

3. G:

and some adobe huts on your left

4. F:

oh okay

 

b.

View full size image

It may be noted that DiAML-XML is a compact way of using XML for representing annotation structures. For example, the annotation in (9b) can be regarded as abbreviating the standard XML expression in (10), where ‘fs’ stands for ‘feature structure’ and ‘f’ for ‘feature’ (following ISO standard 24610 for representating feature structures).Footnote 8

The fact that a DiAML-XML expression can be viewed as abbreviating a standard full XML form is useful for combining dialogue act annotations with annotations of other semantic or pragmatic information. This is discussed in Sect. 5.3.

(10)

View full size image

3 Data in the DialogBank

3.1 Overview

Most of the dialogues in the DialogBank have been taken from existing corpora. To become ISO-compliant, in most cases their original segmentation as well as their annotation and their representation format needed to be adapted. Some of the dialogues had previously been annotated with a version of the DIT\(^{++}\) annotation schemeFootnote 9, on which the ISO 24617-2 standard is largely based, in which case only relatively minor adjustments were needed. In all cases, the annotations were double checked for errors, omissions, and inconsistencies in order to ensure gold standard quality. To facilitate the inspection and correction of existing annotations it was sometimes convenient to be able to inspect the annotation representations in a tabular form; this is briefly discussed in Sect. 3.2.

The dialogues from the Switchboard (SWBD-DA) corpus were originally annotated with communicative function labels from the SWBD-DAMSL annotation scheme. Fang et al. (2011, 2012a, b) applied semi-automatic procedures for replacing the SWBD-DAMSL tags by ISO 24617-2 function tags while retaining the SWBD-DA segmentation, showing that 84% of the re-tagging can be done automatically. The resulting ‘SWBD-ISO’ corpus forms a resource ‘halfway’ between the SWBD-DA corpus and an ISO-annotated version. The Switchboard dialogues in the DialogBank were re-segmented according to the finer-grained ISO 24617-2 segmentation into functional segments, and annotated with ISO 24617-2 tags, adding qualifiers, functional and feedback dependence relations, and DR-Core rhetorical relations.

The dialogues from the HRCR Map Task and TRAINS corpora have previously been re-annotated according to the DIT\(^{++}\) annotation scheme, release 5 (see http://dit.uvt.nl) using the ANVIL tool (Kipp 2001, 2014; Bunt et al. 2012). These annotations were enriched with DR-Core rhetorical relations, and their ‘DiAML-Anvil’ format was adjusted to fully comply with the DiAML-XML format.

The dialogues of the DBOX corpus were annotated with the ANVIL tool according to the ISO 24617-2 annotation scheme with minor extensions, justified by domain-specific requirements; see Petukhova et al. (2014). They only needed some reformatting.

The dialogues in the DIAMOND corpus were annotated with the communicative functions and dimensions of DIT\(^{++}\) release 3, using the DitAT annotation tool (see Sect. 3.2).

The dialogues in the Dutch Map Task corpus were collected with the primary aim to study the phonology and phonetics of intonation in dialogue (see Caspers 2000a, b). They were segmented and annotated according to ISO 24617-2 from scratch, adding qualifiers, dependence relations and DR-Core rhetorical relations.

The dialogues from the OVIS and Schiphol corpora were annotated with the communicative functions and dimensions of DIT\(^{++}\) release 3 and produced with the ANVIL tool. They were re-annotated from scratch.

Table 3 summarizes the annotations and representations of the material in the DialogBank. The next subsection describes the representation of DiAML annotations in the tabular formats that have been defined to facilitate the inclusion of corrected annotated dialogue material in the DialogBank.

Besides the annotated dialogues, the DialogBank also contains detailed guidelines for using the ISO 24617-2 standard, practical tips for constructing ISO-compliant annotations, software for reformatting annotation representations, and an online bibliography.

Table 3 Types of data in the DialogBank

3.2 SWBD-DAMSL and DitAT annotations

As mentioned above, some of the dialogues in the DialogBank were previously annotated using tabular formats. This is illustrated in Table 4 by a dialogue fragment as originally annotated in the Switchboard-DA corpus, and in Table 5 by a dialogue fragment from the TRAINS corpus, annotated with DIT\(^{++}\) communicative functions and produced with the DitAT annotation tool (Geertzen 2007), which was developed in order to support multidimensional dialogue annotation and analysis.

Table 4 Annotation of Switchboard (SWBD-DA) dialogue fragment
Table 5 Representation in tabular form of DIT\(^{++}\) 4.0 annotations produced with the DitAT tool for a fragment of a TRAINS dialogue

Although the representations in Tables 4 and 5 look rather different, and very different from the XML format used in (9), they all contain largely the same information. The row numbered 4 in Table 5, for example, corresponds to the following XML expression (with dialogue act identifiers added), where ’fs4’ identifies the functional segment “yes hello, maybe”:

(11)

View full size image

The two tabular formats shown here have the limitation that only contiguous, non-overlapping functional segments can be represented. The full DiAML-XML annotation of this example, with functional and feedback dependence relations and a certainty qualifier, is shown in (12).

(12)

View full size image

4 Interoperability of representations

4.1 Abstract syntax and alternative representations

The distinction of an abstract syntax, besides a concrete representation format (Bunt 2010), allows a precise determination of the interoperability of alternative representations. Figure 2 displays the relations between an abstract syntax, one or more alternative ideal (complete and unambiguous) representation formats, and the semantics of a markup language.

Fig. 2
figure 2

Abstract and concrete syntax, and semantics

Since the DiAML-XML format is defined as ideal (complete and unambiguous) for representing the annotation structures defined by the DiAML abstract syntax, a function \(F_{XML}\) can be defined that maps DiAML annotation structures to DiAML-XML expressions, and this function has an inverse \(F_{XML}^{-1}\) which maps any DiAML-XML expression to the annotation structure that it encodes.

Representations in tabular form, like those in Tables 4 and 5, have several advantages over representations in XML:

  1. 1.

    They are less verbose and, partly for that reason, more convenient for inspection and correction. They share this advantage with e.g. JSON representations (see Crockford 2009).

  2. 2.

    Specific tabular formats allow easy comparison with other pre-existing formats; e.g., a 3-column format for ISO 24617-2 annotations allows easy comparison with SWBD-DAMSL annotations.

  3. 3.

    Tabular formats may be tuned to the multidimensional structure of the ISO annotation scheme; e.g., the format of Table 5 allows one to see the multifunctionality of the utterances in a dialogue at a glance.

For these reasons, two tabular formats for representing DiAML annotations were devised that overcome the limitations of the formats used in Tables 4 and 5, one inspired by the SWBD-DAMSL annotation format, called DiAML-TabSW, and one tuned to the multidimensionality of ISO 24617-2, called DiAML-MultiTab. The completeness and unambiguity of both formats is shown in Bunt et al. (2016), where the encoding functions \(F_{MultiTab}\) and \(F_{TabSW}\) are defined as well as their inverses. Compositions of encoding and decoding functions, such as \(F_{XML}\)o\(F_{MultiTab}^{-1}\), define a conversion from one representation format to another.

The inter-convertibility of the three DiAML formats is exploited in the DialogBank by allowing users to view annotations in the form that is most convenient to him or her, as well as by converting the tabular formats to the XML format for automatic processing, if desired.

4.2 DiAML abstract syntax

The abstract syntax of DiAML reflects the conceptual analysis of dialogue acts that underlies the ISO 24617-2 annotation scheme, as expressed in the metamodel in Fig. 1. A dialogue act is thus characterized by the following seven elements:

  1. 1.

    the sender; every dialogue act has exactly one sender who is ‘responsible’ for the act, even though more than one speaker may contribute; see example (13):

(13)

1. A:

and then should I specify the uhm, uhm,

2. B:

budget code, you should specify the budget code, that’s 5611

In this example, A is struggling to formulate a question and B helps by providing the term that A was looking for. The first part of B’s utterance is a dialogue act with the communicative function Completion, in the Partner Communication Management dimension. The functional segment “and then should I specify the budget code”, made up of parts of what A and B say, expresses a question for which A is ‘responsible’ and is considered as the sender. The second part of B’s utterance “you should specify the budget code” is an answer to that question (and the third part is an elaboration of that answer).

  1. 2.

    one or more addressees; in a two-person dialogue the addressee is just the one who is not the sender; in multiparty dialogues, such as those of the AMI corpus, all the participants who are not the sender are addressees, unless the speaker picks out one of them (in which case the other participants form the ‘other participants’).

  2. 3.

    zero or more other participants (if any), such as a bystander or an audience, or other side-participants (see Clark 1996);

  3. 4.

    the communicative function;

  4. 5.

    the dimension;

  5. 6.

    zero or more functional dependence relations or feedback dependence relations;

  6. 7.

    zero or more qualifiers of certainty, conditionality, and/or sentiment.

Whether a dialogue act has a dependence relation to another dialogue act is determined by its communicative function and dimension. A functional dependence means that the semantic content of a dialogue act is co-determined by the semantic content of a previous dialogue act, due to having a communicative function of a responsive character. This is for example the case for answers, whose meaning is partly determined by the question that is being answered, but also for the acceptance or rejection of offers, suggestions, requests, and the acceptance of apologies and thankings.

The semantic content of a feedback act (in the Auto-Feedback or in the Allo-Feedback dimension) is partly determined by what the feedback is about. Feedback utterances like “OK”, “Yes”, and “Really?” illustrate this. While positive feedback acts are typically about the processing of previous dialogue acts, negative feedback acts are often about a problem in understanding something, and may thus refer to a segment of speech rather than to its interpretation as a dialogue act. ISO 24617-2 therefore allows feedback dependence relations to have both dialogue acts and dialogue segments as antecedents.

Since responsive dialogue acts and feedback acts are semantically incomplete without the specification of functional and feedback dependences, these are part of the structures that are used to annotate such acts.

Different from functional and feedback dependence relations, rhetorical relations are not part of the meaning of a dialogue act, but add information to the way two or more semantically complete dialogue acts are related; they are therefore not part of a structure that describes a dialogue act, but they occur in link structures that relate dialogue acts, as illustrated in (9) on page 7.

An abstract syntax consists in general of: (a) a specification of the elements from which annotation structures are built up, called a ‘conceptual inventory’, and (b) a specification of the possible ways of constructing annotation structures using these elements. The DiAML abstract syntax is defined by the following specification:

DiAML abstract syntax specification.

a. Conceptual inventory

The DiAML conceptual inventory consists of five sets:

  1. 1.

    A set of dimensions, notably the nine dimensions listed in Sect. 3.2.

  2. 2.

    A set of communicative functions, namely the 56 functions listed in Table 1; the set is partitioned into ‘general-purpose’ functions, which can be used in any dimension, and for each dimension except Task a set of ‘dimension-specific’ functions (no task-specific communicative functions are defined, since the annotation scheme is designed to be application-independent). A subset RSP of the set of communicative functions is specified as the ‘responsive’ communicative functions.

  3. 3.

    A set of qualifiers that can be associated with dialogue acts, partitioned into subsets for certainty, conditionality, and sentiment.

  4. 4.

    A set of dialogue participants, including possible side-participants or audiences, besides actively participating speakers and addressees.

  5. 5.

    A set of functional segments of primary data.

The sets of functional segments and dialogue participants are specific for a particular annotation task; the other concepts are task-independent.

b. Annotation structures

A DiAML annotation structure is a set

(14)

\(\{\epsilon _1, \ldots , \epsilon _k,\) \(L_1, \dots , \) \(L_m\}\)

consisting of the entity structures \(\{\epsilon _1, \ldots , \epsilon _k,\}\), with \(k \ge 1\), and the link structures \(\{L_1, \dots , \)\(L_m\}\) (with \(m \ge 0\)). Entity structures contain semantic information about a functional segment; link structures describe semantic relations between functional segments.

An entity structure in DiAML is a pair

(15)

\(\epsilon \) = \(\langle \)\(m, \alpha \rangle \)

consisting of a functional segment m (a ‘markable’) and the characterization of a dialogue act \(\alpha \), which is an n-tuple with \(5 \le n \ \le 7\). In the most complex case a dialogue act takes the form of a 7-tuple, as in (16), where S is the sender of the dialogue act; A is a set of addressees; H is a set of non-participating witnesses of the dialogue; d is a dimension; f is a communicative function; Q is a set of qualifiers, and \(\varDelta \) is a set of other dialogue acts that the dialogue act in focus depends on.

(16)

\(\alpha \) = \(\langle \)\(S, A, H, d, f, Q, \varDelta \)\(\rangle \)

In the simplest case, a dialogue occurs in a setting where there are no side-participants,Footnote 10, does not functionally depend on previous dialogue acts (i.e., does not have a responsive communicative function), and has no feedback dependence relation. In that case it is a quintuple \(\alpha \) = \(\langle \)SAdfQ\(\rangle \)

A link structure in DiAML is a triple \(\langle \)\(\epsilon , E, \rho \)\(\rangle \), consisting of an entity structure \(\epsilon \), corresponding to a dialogue act, a non-empty set E of entity structures that correspond to rhetorically related dialogue acts, and the rhetorical relation \(\rho \) that relates the dialogue acts in \(\epsilon \) and E.

4.3 DiAML representations

4.3.1 Anchoring annotations in primary data

DiAML relies on a three-level architecture:

  1. (1)

    a primary source, which may correspond to a speech recording, a video clip, a textual transcription, or a low-level annotation thereof;

  2. (2)

    the marking of functional segments in the primary source;

  3. (3)

    the dialogue act information associated with the functional segments.

Annotation in DiAML is concerned with level (3) and follows the stand-off annotation approach: annotations refer to segments of the primary data specified at level (2), and the primary data are kept separate. The 3-level architecture is clearly visible in DiAML-XML representations, such as (9), where functional segments appear as the values of the ‘target’ attribute, which are assumed to be given as markables; Fig. 3 shows how these markables can be defined at level 2 in a TEI-compliant way.

Fig. 3
figure 3

TEI-compliant segmentation of primary data

The DiAML-TabSW format was defined in such a way that it fits into this 3-level architecture and facilitates comparisons between ISO 24617-2 annotations and SWBD-DAMSL annotations. This is described next.

4.3.2 The DiAML-TabSW format

DiAML-TabSW was designed to represent ISO 24617-2 annotations in a form that resembles the annotations in the Switchboard-DA corpus, shown in Table 4. Annotations in this form are not ISO-compliant in three respects: (1) the annotated segments correspond to slash units, which are more coarse-grained than functional segments and cannot be discontinuous or overlapping; (2) the annotated units are not represented in stand-off form but are defined in the same file as the annotations, which moreover contain in-line markups; (3) they do not support the annotation of relations between dialogue acts. The annotations can be made ISO-compliant by (1) re-segmenting the dialogue into functional segments, and replacing slash unit numbers by references to segments of primary data in a separate file; (2) removing all in-line markups and instead add ISO 24617-2 functional markups; (4) add an identifier to each dialogue act, in order to allow the specification of relations between dialogue acts. The resulting format supports the representation of all the types of information in ISO 24617-2 annotations by using, instead of just communicative function names (or SWBD-DAMSL codes, like ‘qw’), expressions of the form (17), as illustrated by (18). For the dialogue fragment of Table 4, the resulting representation is shown Table 6.

(17)

Dimension:Communicative Function (dependence:antecedent\(^*\)) [qualifiers]\(^*\)

{Rhetorical relation:antecedent\(^*\)}

(18)

Task:answer(da1)[uncertain]{Expansion:expander da7}

The functional segment identifiers in the first column in Table 6 refer to stretches of the primary data specified for instance as a sequence of word tokens or as a stretch of speech with a given start- and end point. This file corresponds to level (2) in the 3-level architecture, and forms an implementation of stand-off annotation in tabular form. It remedies the limitation of SWBD-DAMSL annotations of being unable to deal with discontinuous or overlapping functional segments. For example, the discontinuous functional segment fs3 in Table 6 is specified in the file sw0105-fs as consisting of the word tokens w12, w13, w14, and w16, which form the discontinuous segment (I, kind, of, [uh], I).

For the sake of readability, the text of a functional segment is represented in an extra column; the transcripts of speaker turns were retained, allowing one to see immediately where a functional segment occurs in an utterance. The two rightmost columns are strictly speaking redundant, and play no role in the semantic interpretation of DiAML annotations.Footnote 11

Table 6 ISO 24617-2 annotation of Switchboard SWBD-DA dialogue fragment in Table 4, represented in DiAML-TabSW format

4.3.3 The DiAML-MultiTab format

The representations produced by the DitAT tool (see Table 5) are not ISO-compliant for their within-file definition of functional segments and for not supporting the annotation of relations between dialogue acts.

Table 7 ISO 24617-2 annotation of TRAINS dialogue fragment from Table 5 slightly extended and epresented in DiAML-MultiTab format

Full ISO-compliance can be achieved in a similar way as above, by using functional segment identifiers as references to a separate file; introducing identifiers for each dialogue act; and replacing communicative function names by dialogue act descriptions in the form (17). The resulting DiAML-MultiTab representation is shown in Table 7.

4.4 Advantages of alternative representation formats

The DiAML-XML representation format was originally motivated by the relative compactness of its expressions, compared to full-out standard XML, and by its transparent semantics. When developing the DialogBank, the DiAML-TabSW and DiAML-MultiTab formats were helpful in the process of re-annotating dialogues from the Switchboard-DA corpus and dialogues that had been annotated according to an earlier version of the DIT\(^{++}\) scheme, taking the original annotations into account rather than annotating these dialogues entirely from scratch. In particular, in this process inconsistencies and omissions were often noted in the original annotations, also in cases where the ISO scheme had already been applied but corrections were needed in order to achieve gold standard quality; the tabular representations were helpful in the detection and correction of errors. User-based evaluation has shown the usability of both tabular DiAML formats, for trained as well as for untrained annotators (Wijnnhoven 2016).

The interoperability of the three DiAML representation formats has been exploited by implementing conversions between any two of the three formats, using their common underlying abstract syntax as an interlingua. This allows users to view (and to produce) ISO 24617-2 annotations in the representation format that is most convenient for them. A Python script for this purpose that runs both on MS Windows and Apple platforms (Wijnhoven 2016), is available from the DialogBank.Footnote 12)

5 ISO 24617-2 limitations and extensions

In building the DialogBank two limitations were discovered of the ISO 24617-2 annotation scheme, which have the effect of annotating feedback dependence relations as well as rhetorical relations between dialogue acts less accurately than possible. These limitations, discussed below, are planned to be remedied in a revised version of the standard.

5.1 Annotating feedback dependence relations

Feedback acts are about the processing of something that was said before. The nature of this ‘something’ depends on the kind of feedback. Feedback by means of expressions like “OK”, “Uh-huh”, or “Really?” is about one or more previous dialogue acts, while feedback by means of “Tuesday?” or “What did you say?” is about a previous utterance segment, rather than about a dialogue act. The ISO 24617-2 annotation scheme therefore allows both dialogue acts and functional segments as antecedents for feedback dependence relations.

The ISO scheme is not quite accurate at this point, since segment-related feedback is not necessarily about a functional segment; it may be about any previous segment, functional or not, such as a single word or a sequence of words within a functional segment. In the latter case the ISO scheme only allows annotating a feedback dependence relation to the functional segment containing the expression that the feedback act refers to. In the planned Edition 2 of the ISO standard, the possibility will be offered to refer back to non-functional segments of communicative behaviour. This has already been done in the DBOX dialogues in the DialogBank, which deviate in this respect from the current standard.

5.2 Annotating rhetorical relations

ISO 24617-2 does not require the marking up of rhetorical relations, such as Cause, Contrast, or Elaboration, and does not specify any particular set of relations that could be used; it only specifies how a rhetorical relation between two dialogue acts can be marked up, namely by means of a rhetoricalLink element that indicates two dialogue acts and a rhetorical relation, as illustrated in (20).

As mentioned above, users of the ISO standard have sometimes included annotations of rhetorical relations, mostly by using the DR-Core relations defined in ISO 24617-8 with a few extensions. When re-annotating or newly annotating dialogues for inclusion in the DialogBank, two limitations were noted: (1) the lack of a possibility to mark up argument roles, and (2) the impossibility to distinguish between a rhetorical relation that links two dialogue acts and one that links the semantic content of two dialogue acts (or mixed cases). These problems are discussed in the rest of this section.

Rhetorical relations are commonly assumed to have two arguments, for example, a Cause relation has two arguments, a ‘Reason’ and a ‘Result’ (or ’Cause’ and ’Effect’). The DR-Core annotation scheme requires argument roles to be marked up, as in (19), where the event of John pushing Jim is marked up as being a reason for the event of Jim falling on the ground.

(19)

John pushed Tim. He fell on the ground.

View full size image

ISO 24617-2, by contrast, provides just a single slot for specifying a rhetorical relation, and has no provisions for marking up argument roles, as illustrated in (20), where the ‘rhetoricalLink’ element indicates the occurrence of a causal relation between the Inform act expressed by “he has the flu” and the answer “He didn’t come in”, but this does not make clear that the information in the Inform act is the reason in the causal relation, rather than the result.

(20)

A: Have you seen Pete today?

B: He didn’t come in; he has the flu.

View full size image

In some of the annotations in the DialogBank this limitation has been addressed by marking up a relation plus an argument role in strings of the form ‘Cause:Reason’. From a semantic point of view, this is not an adequate solution since the underlying abstract syntax and semantics only include rhetorical relations, no argument roles.

Another limitation of the annotation of rhetorical relations in ISO 24617-2 is that it is not possible to distinguish between ’semantic’ and ’pragmatic’ interpretations of such relations. Example (21) illustrates this distinction:

(21)

A:

Have you seen Pete today?

a. B:

He didn’t come in. He has the flu.

b. B:

He didn’t come in. He sent me a message saying that he has the flu.

B’s utterances in (21) are causally related in the sense that the semantic content of the second utterance (Pete has the flu) is the reason for the content of the first utterance. In (21b), by contrast, there is a ’pragmatic’ causal relation in the sense that the second utterance expresses the reason why B says that Pete didn’t come in - B’s second utterance expresses the cause of the occurrence of the dialogue act of informing A that Pete dit not come in today.

In the DR-Core annotation scheme this distinction is represented by indicating the types of the arguments, where ’dialogue act’ is one of the possible types, and the type of the semantic content of a dialogue act (e.g. event or state) is another. This is illustrated in example (22), which shows the annotation of the examples in (21) represented in the markup language of DR-Core, DRelML (Discourse Relations Markup Language).

(22)

a.

View full size image

b.

View full size image

In both (22a) and (22b) an implicit Cause relation is marked up between the arguments expressed by the markables fs2 (“Pete did not come in today”) and fs3 (“He has the flu”.; “He sent me a message saying that he has the flu”, respectively), but in the former case the first argument is the event of Pete not coming in which is caused by the second argument, while in the latter case it is the dialogue act of B informing A that Pete did not come in which is caused by the second argument. This distinction cannot be expressed in DiAML. In DRelML, on the other hand, no information about the arguments of a rhetorical relation can be represented other than their semantic types. For marking up rhetorical relations between dialogue acts it would thus seem attractive to combine ingredients from DiAML and DRelML. This is discussed in the next subsection.

5.3 Combinations of annotation schemes

It was noted in Sect. 3.2 that DiAML-XML is in fact a compact way of using XML, as illustrated by (9b) and (10). Likewise, a DRelML annotation of a rhetorical relation like the one in (23a) is a compact form of the full XML expression in (23b):

(23)

a.

He didn’t come in. He has the flu.

b.

View full size image

Since the concatenation of two XML-expressions is again a legitimate XML-expression, we may combine the relevant bits of a DiAML annotation of dialogue acts and a DRelML annotation of rhetorical relations. Applied to B’s utterances in the example (21b) this would lead to the representation shown in (24b) and in compact form in (24c).

(24)

a.

(A: Have you seen Pete today?)

 

B: He didn’t come in. He sent me a message saying that he has the flu.

b.

View full size image

c.

View full size image

Simply concatenating bits of XML, either in full or in compact form, is not satisfactory, however, since it would lead to having two different annotations of the same segment (segment s3), one that views the segment as a dialogue act and one that views it as an event. Both views are justifiable, but there is no relation between the two views, which makes the semantic interpretation of such expressions problematic.

The missing link is that of semantic content: the event view describes the semantic content of the dialogue act da3, so this can be resolved by introducing an XML attribute @semanticContent in the <dialogueAct> element, whose value is the event in question.

Introducing information about the semantic content of dialogue acts, not just in the representations but also in the underlying abstract syntax and semantics, opens up interesting possibilities of combining dialogue act annotation with semantic information addressed by other annotation schemes, in particular by ISO 24617-1 (‘ISO-TimeML’) and ISO 24617-7 (ISO-Space) for the annotation of events and their temporal and spatial properties, by ISO 24617-4 (Semantic Roles) for adding information about the participants in an event, and in the future also for adding information about quantification over events and their participants (see Bunt et al. 2018a).

6 Conclusions and future work

The DialogBank had its first public release in December 2015. It contains at the time of writing annotated dialogues with the properties shown in Table 1. Material from English-language dialogue corpora (HCRC Map Task, Switchboard, TRAINS) and from Dutch-language corpora (DIAMOND, OVIS, Schiphol, Dutch Map Task) was re-segmented and re-annotated according to ISO 24617-2. To facilitate comparisons between original and ISO-compliant segmentation and annotation, as well as in support of the detection and correction of errors and omissions, two tabular representation formats were defined that were shown to be ideal (complete and unambiguous) and hence interoperable with the reference DiAML-XML format of the ISO 24617-2 standard. The interoperability was exploited by implementing conversions between the three representation formats, allowing users of the DialogBank to view (or to download and use) the annotated dialogues in the form that is most convenient for them.

Building the DialogBank brought certain limitations of ISO 24617-2 to light for accurately annotating the ‘antecedents’ of feedback acts (as well as of speech editing acts, i.e. acts in the Own Communication Management or in the Partner Communication Management dimension). This issue will be addressed in the planned revision of the ISO 24617-2 standard (see Bunt et al. 2017a).

Another lesson learned from building the DialogBank concerns the annotation of rhetorical relations in dialogue. In ISO 24617-2 this is just an option; there is no obligation to mark up such relations, but the rhetorical linking in a sequence of dialogue acts often needs to be known for a good understanding of the dialogue. It would therefore be desirable to integrate the annotation of rhetorical relations into dialogue act annotation. We have seen that to do this in an adequate fashion requires the addition of a possibility to support the annotation of information about the semantic content of a dialogue act. Realizing such an addition would be challenging but promising for obtaining semantically richer annotations, which could be useful for a variety of applications in human-computer dialogue systems (see e.g. Malchanau (2018)).

The revised annotation standard ISO 24617-2, Second Edition, is planned to be designed in a way that is ‘downward compatible’ with the first edition (see Bunt et al. 2018b), in the sense that annotations made with the first edition will remain valid according to the second edition (just being slightly less accurate or informative). The current content of the DialogBank therefore will not need to be adapted to the standard’s second edition, although it may be interesting to do so in some cases.

Future work will aim at (1) increasing the number of annotated dialogues in the DialogBank; (2) including dialogues annotated according to the revised ISO standard; (3) including annotated dialogues in other languages besides English and Dutch. Languages for which annotation according to the ISO standard has been undertaken or is being considered include Italian (Chowdhury et al. 2016; Mezza et al. 2018), Vietnamese (Ngo et al. 2018), and Chinese (Fang et al. 2018).