The ALICO corpus: analysing the active listener

Malisz, Zofia; Włodarczak, Marcin; Buschmeier, Hendrik; Skubisz, Joanna; Kopp, Stefan; Wagner, Petra

doi:10.1007/s10579-016-9355-6

The ALICO corpus: analysing the active listener

Project Notes
Published: 21 May 2016

Volume 50, pages 411–442, (2016)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Zofia Malisz^1,2,
Marcin Włodarczak³,
Hendrik Buschmeier⁴,
Joanna Skubisz⁵,
Stefan Kopp⁴ &
…
Petra Wagner⁶

877 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

The Active Listening Corpus (ALICO) is a multimodal data set of spontaneous dyadic conversations in German with diverse speech and gestural annotations of both dialogue partners. The annotations consist of short feedback expression transcriptions with corresponding communicative function interpretations as well as segmentations of interpausal units, words, rhythmic prominence intervals and vowel-to-vowel intervals. Additionally, ALICO contains head gesture annotations of both interlocutors. The corpus contributes to research on spontaneous human–human interaction, on functional relations between modalities, and timing variability in dialogue. It also provides data that differentiates between distracted and attentive listeners. We describe the main characteristics of the corpus and briefly present the most important results obtained from analyses in recent years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations

Article 19 October 2016

The Corpus of Interactional Data: A Large Multimodal Annotated Resource

Annotating the TCD D-ANS Corpus – A Multimodal Multimedia Monolingual Biometric Corpus of Spoken Social Interaction

Notes

Four annotators in total worked on the feedback function interpretation in ALICO, namely the first four authors of this paper, out of which JS, MW and ZM are competent but not native speakers of German. Annotation tasks were assigned in rotation to three annotators per recorded session.
While it would be preferable to use a multi-annotator agreement measure, such as Fleiss’s \(\kappa \), this is somewhat problematic on the present dataset given that each dialogue was annotated by a different subset of annotators. For this reason, we resort to pairwise comparisons between individual annotators.

References

Allwood, J., Nivre, J., & Ahlsén, E. (1992). On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9, 1–26. doi:10.1093/jos/9.1.1.
Article Google Scholar
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., & Paggio, P. (2007). The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Language Resources and Evaluation, 41, 273–287. doi:10.1007/s10579-007-9061-5.
Article Google Scholar
Barbosa, P. A. (2006). Incursõeses em torno do ritmo da fala [Incursions into speech rhythm]. Campinas: Pontes.
Google Scholar
Bavelas, J. B., Coates, L., & Johnson, T. (2000). Listeners as co-narrators. Journal of Personality and Social Psychology, 79, 941–952. doi:10.1037/0022-3514.79.6.941.
Article Google Scholar
Beňuš, Š., Gravano, A., & Hirschberg, J. (2011). Pragmatic aspects of temporal accommodation in turn-taking. Journal of Pragmatics, 43, 3001–3027. doi:10.1016/j.pragma.2011.05.011.
Article Google Scholar
Bergmann, K., & Kopp, S. (2006). Verbal or visual? How information is distributed across speech and gesture in spatial dialogue. In Proceedings of the 10th workshop on the semantics and pragmatics of dialogue, Potsdam, Germany, pp. 90–97.
Boersma, P., & Weenink, D. (2013). Praat: Doing phonetics by computer [computer program]. Version 5.3.68. http://www.praat.org/.
Breen, M., Dilley, L. C., Kraemer, J., & Edward, G. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (tones and break indices) and RaP (rhythm and pitch). Corpus Linguistics and Linguistic Theory, 8, 277–312. doi:10.1515/cllt-2012-0011.
Article Google Scholar
Bunt, H. (2007). Multifunctionality and multidimensional dialogue act annotation. In E. Ahlsén, P. J. Henrichsen, R. Hirsch, J. Nivre, Å. Abelin, S. Strömqvist, & S. Nicholson (Eds.), Communication—Action—Meaning. A Festschrift to Jens Allwood (pp. 237–259). Gothenburg: Gothenburg University Press.
Google Scholar
Buschmeier, H., & Włodarczak, M. (2013). TextGridTools: A TextGrid processing and analysis toolkit for Python. In Proceedings der 24. Konferenz zur elektronischen Sprachsignalverarbeitung, Bielefeld, Germany, pp. 152–157.
Buschmeier, H., & Kopp, S. (2012). Using a Bayesian model of the listener to unveil the dialogue information state. In SemDial 2012: Proceedings of the 16th workshop on the semantics and pragmatics of dialogue, Paris, France, pp. 12–20.
Buschmeier, H., Malisz, Z., Włodarczak, M., Kopp, S., & Wagner, P. (2011). ‘Are you sure you’re paying attention?’ —‘Uh-huh’. Communicating understanding as a marker of attentiveness. In Proceedings of Interspeech 2011, Florence, Italy, pp. 2057–2060.
Buschmeier, H., Malisz, Z., Skubisz, J., Włodarczak, M., Wachsmuth, I., Kopp, S., et al. (2014). ALICO: A multimodal corpus for the study of active listening. In Proceedings of the 9th conference on language resources and evaluation, Iceland, Reykjavík, pp. 3638–3643.
Cerrato, L. (2007). Investigating communicative feedback phenomena across languages and modalities. Ph.D. thesis, KTH Stockholm, Department of Speech, Music and Hearing, Stockholm, Sweden.
Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511620539.
Book Google Scholar
Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13, 259–294. doi:10.1207/s15516709cog1302_7.
Article Google Scholar
de Kok, I., & Heylen, D. (2011). The MultiLis corpus—Dealing with individual differences in nonverbal listening behavior. In Proceedings of the 3rd COST 2102 International Training School, Caserta, Italy, pp. 362–375. doi:10.1007/978-3-642-18184-9_32.
de Ruiter, J. P. (2000). The production of gesture and speech. In D. McNeill (Ed.), Language and gesture (pp. 284–311). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511620850.018.
Chapter Google Scholar
Dittmann, A. T., & Llewellyn, L. G. (1968). Relationship between vocalizations and head nods as listener responses. Journal of Personality and Social Psychology, 9, 79–84. doi:10.1037/h0025722.
Article Google Scholar
Duncan, S., & Fiske, D. W. (1977). Face-to-face interaction: Research, methods, and theory. Hillsdale, NJ: Erlbaum.
Google Scholar
Edlund, J., Heldner, M., Al Moubayed, S., Gravano, A., & Hirschberg, J. (2010). Very short utterances in conversation. In Proceedings Fonetik 2010, Lund, Sweden, pp. 11–16.
Gardner, R. (2001). When listeners talk. Response tokens and listener stance. Amsterdam: John Benjamins Publishing Company. doi:10.1075/pbns.92.
Garrod, S., & Anderson, A. (1987). Saying what you mean in dialogue: A study in conceptual and semantic co-ordination. Cognition, 27, 181–218. doi:10.1016/0010-0277(87)90018-7.
Article Google Scholar
Geertzen, J., Petukhova, V., & Bunt, H. (2008). Evaluating dialogue act tagging with naive and expert annotators. In Proceedings of the 6th international conference on language resources and evaluation, Marrakech, Morocco, pp. 1076–1082.
Goldin-Meadow, S., Alibali, M., & Church, S. (1993). Transitions in concept acquisition: Using the hand to read the mind. Psychological Review, 100, 279–297. doi:10.1037/0033-295X.100.2.279.
Article Google Scholar
Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press.
Google Scholar
Gravano, A., Beňuš, Š., Hirschberg, J., Mitchell, S., & Vovsha, I. (2007). Classification of discourse functions of affirmative words in spoken dialogue. In Proceedings of Interspeech 2007, Antwerp, Belgium, pp. 1613–1616.
Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12, 175–204.
Google Scholar
Hadar, U., Steiner, T., & Rose, C. F. (1985). Head movement during listening turns in conversation. Journal of Nonverbal Behavior, 9, 214–228. doi:10.1007/BF00986881.
Article Google Scholar
Hartmann, B., Mancini, M., & Pelachaud, C. (2006). Implementing expressive gesture synthesis for embodied conversational agents. In Proceedings of the 6th International Gesture Workshop, Berder Island, France, pp. 188–199. doi:10.1007/11678816_22.
Heldner, M., Hjalmarsson, A., & Edlund, J. (2013). Backchannel relevance spaces. In Nordic Prosody XI, Tartu, Estonia, Peter Lang Publishing Group, pp. 137–146.
Heylen, D. (2006). Head gestures, gaze and the principle of conversational structure. International Journal of Humanoid Robotics, 3, 241–267. doi:10.1142/S0219843606000746.
Article Google Scholar
Heylen, D., Bevacqua, E., Pelachaud, C., Poggi, I., Gratch, J., & Schröder, M. (2011). Generating listening behaviour. In P. Petta, C. Pelachaud, & R. Cowie (Eds.), Emotion-oriented systems: The Humaine handbook. Berlin: Springer. doi:10.1007/978-3-642-15184-2_17.
Google Scholar
Inden, B., Malisz, Z., Wagner, P., & Wachsmuth, I. (2013). Timing and entrainment of multimodal backchanneling behavior for an embodied conversational agent. In Proceedings of the 15th international conference on multimodal interaction, Sydney, Australia, pp. 181–188. doi:10.1145/2522848.2522890.
Ishi, C. T., Ishiguro, H., & Hagita, N. (2014). Analysis of relationship between head motion events and speech in dialogue conversation. Speech Communication, 57, 233–243. doi:10.1016/j.specom.2013.06.008.
Article Google Scholar
Kane, J., & Gobl, C. (2011). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of INTERSPEECH 2011, Florence, Italy, pp. 177–180.
Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26, 22–63. doi:10.1016/0001-6918(67)90005-4.
Article Google Scholar
Kendon, A. (1980). Gesture and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), Nonverbal communication and language (pp. 207–227). The Hague: Mouton.
Google Scholar
Kisler, T., Schiel, F., & Sloetjes, H. (2012). Signal processing via web services: The use case WebMAUS. In Proceedings of the workshop on service-oriented architectures for the humanities: Solutions and impacts, Hamburg, Germany, pp. 30–34.
Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., & Den, Y. (1998). An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language and Speech, 41, 295–321. doi:10.1177/002383099804100404.
Google Scholar
Kopp, S., Allwood, J., Grammar, K., Ahlsén, E., & Stocksmeier, T. (2008). Modeling embodied feedback with virtual humans. In I. Wachsmuth & G. Knoblich (Eds.), Modeling communication with robots and virtual humans (pp. 18–37). Berlin: Springer. doi:10.1007/978-3-540-79037-2_2.
Chapter Google Scholar
Kousidis, S., Pfeiffer, T., Malisz, Z., Wagner, P., & Schlangen, D. (2012). Evaluating a minimally invasive laboratory architecture for recording multimodal conversational data. In Proceedings of the interdisciplinary workshop on feedback behaviours in dialogue, Stevenson, WA, USA, pp. 39–42.
Kousidis, S., Malisz, Z., Wagner, P., & Schlangen, D. (2013). Exploring annotation of head gesture forms in spontaneous human interaction. In Proceedings of the Tilburg Gesture Meeting (TiGeR 2013), Tilburg, The Netherlands.
Kuhlen, A. K., & Brennan, S. E. (2010). Anticipating distracted addressees: How speakers’ expectations and addressees’ feedback influence storytelling. Discourse Processes, 47, 567–587. doi:10.1080/01638530903441339.
Article Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. doi:10.2307/2529310.
Article Google Scholar
Malisz, Z., Włodarczak, M., Buschmeier, H., Kopp, S., & Wagner, P. (2012). Prosodic characteristics of feedback expressions in distracted and non-distracted listeners. In Proceedings of The Listening Talker. An interdisciplinary workshop on natural and synthetic modification of speech in response to listening conditions, Edinburgh, UK, pp. 36–39.
McClave, E. Z. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32, 855–878. doi:10.1016/s0378-2166(99)00079-x.
Article Google Scholar
Morrel-Samuels, P., & Krauss, R. M. (1992). Word familiarity predicts temporal asynchrony of hand gestures and speech. Journal of Experimental Psychology: Human Learning and Memory, 18, 615–622. doi:10.1037/0278-7393.18.3.615.
Google Scholar
Nobe, S. (2000). Where do most spontaneous representational gestures actually occur with respect to speech? In D. McNeill (Ed.), Language and Gesture (pp. 186–198). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511620850.012.
Chapter Google Scholar
Oertel, C., Cummins, F., Edlund, J., Wagner, P., & Campbell, N. (2013). D64: A corpus of richly recorded conversational interaction. Journal on Multimodal User Interfaces, 7, 19–28. doi:10.1007/s12193-012-0108-6.
Article Google Scholar
Peters, C., Pelachaud, C., Bevacqua, E., Mancini, M., & Poggi, I. (2005). A model of attention and interest using gaze behavior. In Proceedings of the 5th international working conference on intelligent virtual agents, Kos, Greece, pp. 229–240. doi:10.1007/11550617_20.
Poggi, I., D’Errico, F., & Vincze, L. (2010). Types of nods. The polysemy of a social signal. In Proceedings of the seventh international conference on language resources and evaluation, Valletta, Malta.
Prévot, L., Gorish, J., & Mukherjee, S. (2015). Annotation and classification of french feedback communicative functions. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 29), pp. 302–310.
Reidsma, D., & Carletta, J. (2008). Reliability measurement without limits. Computational Linguistics, 34, 319–326. doi:10.1162/coli.2008.34.3.319.
Article Google Scholar
Schegloff, E. A. (1982). Discourse as an interactional achievement: Some uses of ‘uh huh’ and other things that come between sentences. In D. Tannen (Ed.), Analyzing discourse: Text and talk (pp. 71–93). Washington: Georgetown University Press.
Google Scholar
Schegloff, E. A. (1984). On some gestures’ relation to talk. In J. Atkinson & J. Heritage (Eds.), Structures of social action (pp. 266–296). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511665868.018.
Google Scholar
Sidner, C. L., Kidd, C. D., Lee, C., & Lesh, N. (2004). Where to look: A study of human-robot engagement. In Proceedings of the 9th international conference on intelligent user interfaces, Funchal, Madeira, Portugal, pp. 78–84. doi:10.1145/964442.964458.
Skubisz, J. (2014). Multimodale Feedbackäußerungen im Deutschen. Eine korpusbasierte Analyse zu nonverbalen Feedbackfunktionenam am Beispiel einer Beurteilungsstudie. Master’s thesis, Fakultät für Linguistik und Literaturwissenschaft, Bielefeld University, Bielefeld, Germany.
Truong, K. P., Poppe, R., de Kok, I., & Dirk, H. (2011). A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. InProceedings of Interspeech 2011, Florence, Italy, pp. 2973–2976.
Wagner, P., Malisz, Z., Inden, B., & Wachsmuth, I. (2013). Interaction phonology—A temporal co-ordination component enabling representational alignment within a model of communication. In I. Wachsmuth, J. de Ruiter, P. Jaecks, & S. Kopp (Eds.), Alignment in communication. Towards a new theory of communication (pp. 109–132). Amsterdam: John Benjamins Publishing Company. doi:10.1075/ais.6.06wag.
Chapter Google Scholar
Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209–232. doi:10.1016/j.specom.2013.09.008.
Article Google Scholar
Ward, N., & Tsukahara, W. (2000). Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics, 38, 1177–1207. doi:10.1016/S0378-2166(99)00109-5.
Article Google Scholar
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: A professional framework for multimodality research. In Proceedings of the 5th international conference on language resources and evaluation, Genoa, Italy, pp. 1556–1559.
Włodarczak, M., Bunt, H., & Petukhova, V. (2010). Entailed feedback: Evidence from a ranking experiment. In P. Łupkowski & M. Purver (Eds.), Aspects of semantic and pragmatics of dialogue (pp. 159–162). Poland: Poznań.
Google Scholar
Włodarczak, M., Buschmeier, H., Malisz, Z., Kopp, S., & Wagner, P. (2012). Listener head gestures and verbal feedback expressions in a distraction task. In Proceedings of the interdisciplinary workshop on feedback behaviours in dialogue, Stevenson, WA, USA, pp. 93–96.
Włodarczak, M., Heldner, M., & Edlund, J. (2015). Communicative needs and respiratory constraints. In Proceedings of Interspeech 2015, Dresden, Germany.
Yngve, V. H. (1970). On getting a word in edgewise. In M. A. Campbell, et al. (Eds.), Papers from the Sixth Regional Meeting of the Chicago Linguistic Society (pp. 567–577). Chicago, IL: Chicago Linguistic Society.
Yoganandan, N., Pintar, F. A., Zhang, J., & Baisden, J. L. (2009). Physical properties of the human head: Mass, center of gravity and moment of inertia. Journal of Biomechanics, 42, 1177–1192. doi:10.1016/j.jbiomech.2009.03.029.
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Deutsche Forschungsgemeinschaft (DFG) in the Collaborative Research Center 673 “Alignment in Communication” and the Center of Excellence EXC 277 “Cognitive Interaction Technology” (CITEC), as well as the Swedish Research Council (VR) projects “Samtalets rytm” (2009–1766) and “Andning i samtal” (2014–1072).

Author information

Authors and Affiliations

Department of Computational Linguistics and Phonetics, Saarland University, Saarbrücken, Germany
Zofia Malisz
Department of Speech, Music and Hearing, KTH, Stockholm, Sweden
Zofia Malisz
Department of Linguistics, Stockholm University, Stockholm, Sweden
Marcin Włodarczak
Faculty of Technology and CITEC, Bielefeld University, Bielefeld, Germany
Hendrik Buschmeier & Stefan Kopp
Faculdade de Ciências Sociais e Humanas, Universidade Nova de Lisboa, Lisbon, Portugal
Joanna Skubisz
Faculty of Linguistics and Literary Studies, Bielefeld University, Bielefeld, Germany
Petra Wagner

Authors

Zofia Malisz
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Włodarczak
View author publications
You can also search for this author in PubMed Google Scholar
Hendrik Buschmeier
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Skubisz
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kopp
View author publications
You can also search for this author in PubMed Google Scholar
Petra Wagner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zofia Malisz.

Additional information

Zofia Malisz, Marcin Włodarczak, Hendrik Buschmeier and Joanna Skubisz have contributed equally to this article.

Appendix

See Fig. 11 and Tables 13, 14.

Table 13 ALICO data overview

Full size table

Table 14 Frequency of specific short feedback expressions (SFEs) found in ALICO as classified into three semantic categories (see Table 7)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malisz, Z., Włodarczak, M., Buschmeier, H. et al. The ALICO corpus: analysing the active listener. Lang Resources & Evaluation 50, 411–442 (2016). https://doi.org/10.1007/s10579-016-9355-6

Download citation

Published: 21 May 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10579-016-9355-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The ALICO corpus: analysing the active listener

Abstract

Access this article

Similar content being viewed by others

The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations

The Corpus of Interactional Data: A Large Multimodal Annotated Resource

Annotating the TCD D-ANS Corpus – A Multimodal Multimedia Monolingual Biometric Corpus of Spoken Social Interaction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The ALICO corpus: analysing the active listener

Abstract

Access this article

Similar content being viewed by others

The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations

The Corpus of Interactional Data: A Large Multimodal Annotated Resource

Annotating the TCD D-ANS Corpus – A Multimodal Multimedia Monolingual Biometric Corpus of Spoken Social Interaction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation