The MultiLis Corpus – Dealing with Individual Differences in Nonverbal Listening Behavior
Computational models that attempt to predict when a virtual human should backchannel are often based on the analysis of recordings of face-to-face conversations between humans. Building a model based on a corpus brings with it the problem that people differ in the way they behave. The data provides examples of responses of a single person in a particular context but in the same context another person might not have provided a response. Vice versa, the corpus will contain contexts in which the particular listener recorded did not produce a backchannel response, where another person would have responded. Listeners can differ in the amount, the timing and the type of backchannels they provide to the speaker, because of individual differences - related to personality, gender, or culture, for instance. To gain more insight in this variation we have collected data in which we record the behaviors of three listeners interacting with one speaker. All listeners think they are having a one-on-one conversation with the speaker, while the speaker actually only sees one of the listeners. The context, in this case the speaker’s actions, is for all three listeners the same and they respond to it individually. This way we have created data on cases in which different persons show similar behaviors and cases in which they behave differently. With the recordings of this data collection study we can start building our model of backchannel behavior for virtual humans that takes into account similarities and differences between persons.
KeywordsMultimodal corpus listeners task-oriented
Unable to display preview. Download preview PDF.
- 1.Brugman, H., Russel, A.: Annotating multimedia/multi-modal resources with ELAN. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation, Citeseer, pp. 2065–2068 (2004)Google Scholar
- 2.Cathcart, N., Carletta, J., Klein, E.: A shallow model of backchannel continuers in spoken dialogue. In: European ACL, pp. 51–58 (2003)Google Scholar
- 4.Huang, L., Morency, L.-P., Gratch, J.: Parasocial Consensus Sampling: Combining Multiple Perspectives to Learn Virtual Human Behavior. In: Proceedings of Autonomous Agents and Multi-Agent Systems, Toronto, Canada (2010)Google Scholar
- 5.Huijbregts, M.: Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled. Phd thesis, University of Twente (2008)Google Scholar
- 6.John, O.P., Naumann, L.P., Soto, C.J.: Paradigm shift to the integrative Big-Five trait taxonomy: History, measurement, and conceptual issues, 3rd edn., ch. 4, pp. 114–158. Guilford Press, New York (2008)Google Scholar
- 8.Noguchi, H., Den, Y.: Prosody-based detection of the context of backchannel responses. In: Fifth International Conference on Spoken Language Processing (1998)Google Scholar
- 11.Watson, D., Clark, L.A.: The PANAS-X (1994)Google Scholar