Embodied Conversational Agents in Wizard-of-Oz and Multimodal Interaction Applications

Rojc, Matej; Rotovnik, Tomaž; Brus, Mišo; Jan, Dušan; Kačič, Zdravko

doi:10.1007/978-3-540-76442-7_26

Matej Rojc¹,
Tomaž Rotovnik¹,
Mišo Brus²,
Dušan Jan² &
…
Zdravko Kačič¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4775))

2445 Accesses

Abstract

Embodied conversational agents employed in multimodal interaction applications have the potential to achieve similar properties as humans in face-to-face conversation. They enable the inclusion of verbal and nonverbal communication. Thus, the degree of personalization of the user interface is much higher than in other human-computer interfaces. This, of course, greatly contributes to the naturalness and user friendliness of the interface, opening-up a wide area of possible applications. Two implementations of embodied conversational agents in human-computer interaction are presented in this paper: the first one in a Wizard-of-Oz application and the second in a dialogue system. In the Wizard-of-Oz application, the embodied conversational agent is applied in a way that it conveys the spoken information of the operator to the user with whom the operator communicates. Depending on the scenario of the application, the user may or not be aware of the operator’s involvement. The operator can communicate with the user based on audio/visual, or only audio, communication. This paper describes an application setup, which enables distant communication with the user, where the user is unaware of the operator’s involvement. A real-time viseme recognizer is needed to ensure a proper response from the agent. In addition, implementation of the embodied conversational agent Lili hosting an entertainment show, which is broadcast by RTV Slovenia, will be described in more detail. Employment of the embodied conversational agent as a virtual major-domo named Maja, within an intelligent ambience, using speech recognition system and TTS system PLATTOS, will be also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brill, E.,: A Corpus-Based Approach to Language Learning. PhD thesis (1993)
Google Scholar
Bulyko, I.: Unit Selection for Speech Synthesis Using Splicing Cost with Weighted Finite State Transducers. In: Proc. of Eurospeech (2001)
Google Scholar
Bulyko, I., Ostendorf, M.: Joint prosody prediction and unit selection for concatenative speech synthesis. In: Proc. of ICASSP (2001)
Google Scholar
Emmanuel, R., Schabes, Y.: Finite State Language Processing. The Massachusetts Institute of Technology (1997)
Google Scholar
Gauvain, J.-L., Lamel, L.F., Adda, G., Adda-Decker, M.: Speaker-independent continuous speech dictation. Speech Communication 15, 21–37 (1994)
Article Google Scholar
Holzapfel, M.: Konkatenative Sprachsynthese mit grossen Datenbanken. Ph.D. thesis (2000)
Google Scholar
Kačič, Z., Horvat, B., Zögling Markuš, A.: Issues in design and collection of large telephone speech corpus for Slovenian language. In: Gavirilidou, V., Maria (ur.) (ed.) Second international conference on language resources and evaluation. Proceedings. Athens: European Language Resources Association, Athens, Greece, May 31- June 2, 2000, vol. 2, pp. 943–946 (2000)
Google Scholar
Mohri, M.: On Some Applications of Finite-State Automata Theory to Natural Language Processing. Natural Language Engineering 1 (1995)
Google Scholar
Mohri, M., Sproat, R.: An efficient compiler for weighted rewrite rules. In: 34-th Meeting of the Association for Computational Linguistics ACL 1996, Santa Cruz, California (1996)
Google Scholar
Taylor, P., Black, A., Caley, R.: Heterogeneous Relation Graphs as a Mechanism for Representing Linguistic Information. Speech Communication 33, 153–174 (2001)
Article MATH Google Scholar
Rojc, M.: Time and Space efficient structure of multilingual and polyglot TTS system – architecture with finite-state machines, PhD thesis (2003)
Google Scholar
Rojc, M., Kačič, Z.: Time and Space-Efficient Architecture for a Corpus-based Text-to-Speech Synthesis System. Speech Communication 49(3), 230–249 (2007)
Article Google Scholar
Spens, K.-E., Agelfors, E., Beskow, J., Granström, B., Karlsson, I., Salvi, G.: SYNFACE, a talking head telephone for the hearing impaired. In: Proc. IFHOH. 7th World Congress, Helsinki, Finland (2004)
Google Scholar
Sproat, R.: Multilingual Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht (1998)
Google Scholar
Sproat, R., Riley, M.: Compilation of Weighted Finite-State Transducers from Decision Trees. In: Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pp. 215–222 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia
Matej Rojc, Tomaž Rotovnik & Zdravko Kačič
Agito d.o.o., Ljubljana, Slovenia
Mišo Brus & Dušan Jan

Authors

Matej Rojc
View author publications
You can also search for this author in PubMed Google Scholar
Tomaž Rotovnik
View author publications
You can also search for this author in PubMed Google Scholar
Mišo Brus
View author publications
You can also search for this author in PubMed Google Scholar
Dušan Jan
View author publications
You can also search for this author in PubMed Google Scholar
Zdravko Kačič
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Anna Esposito Marcos Faundez-Zanuy Eric Keller Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rojc, M., Rotovnik, T., Brus, M., Jan, D., Kačič, Z. (2007). Embodied Conversational Agents in Wizard-of-Oz and Multimodal Interaction Applications. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds) Verbal and Nonverbal Communication Behaviours. Lecture Notes in Computer Science(), vol 4775. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76442-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-540-76442-7_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76441-0
Online ISBN: 978-3-540-76442-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics