Abstract
Three questions have to be answered before designing a speech application: who will use it, why will they use it and how often will they use it? A designer needs answers to all of these questions to best be able to address the needs of the target group. This chapter will outline a methodical procedural model which describes the workflow required to build a speech application that is properly designed for its target groups. The workflow covers the analysis of requirements, specification, implementation, production, delivery and operation. This chapter also provides an overview of the most important information we need to describe a voice user interface, and where this information can be found. It also provides an overview of current and future technical developments in the field of speech processing and their relevance for the design of dialogues in future. We will then recommend 11 design features which, according to our experience, help the designer of a voice user interface to exploit knowledge about the user and to focus the design of the dialogue on the user’s abilities, their competence, expectations and needs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anastasi, A. (1976). Differentielle Psychologie. Vol. II, Beltz, Weinheim, 1976.
Asendorpf, J.B. (2003). Person/situation (environment) assessment. In R. Fernández-Ballesteros (Ed.), Encyclopedia of Psychological Assessment. Vol. 2, London, U.K., Sage, pp. 695–698.
Baltes, P. B. (1990). Entwicklungspsychologie der Lebensspanne: Theoretische Leitsätze. Psychologische Rundschau, 41, 1990, pp. 1–24.
Bickmore, T.; Cassell, J. (2005). Social Dialogue with Embodied Conversational Agents. In J. van Kuppevelt, L. Dybkjaer, & N. Bernsen (Eds.), Advances in Natural, Multimodal Dialogue Systems, Springer Netherlands.
Braun, F. (2004): Reden Frauen anders? Entwicklungen und Positionen in der linguistischen Geschlechterforschung. In K. Eichhoff-Cyrus (Ed.), Adam, Eva und die Sprache, Beiträge zur Geschlechterforschung. Mannheim, Dudenverlag, pp. 9–26.
Buisine, S.; Abrilian, S.; Martin, JC. (2004). Evaluation of multimodal behaviour of embodied agents. In Z. Ruttkay and C. Pelachaud (Ed.), From Brows till Trust: Evaluating Embodied Conversational Agents. Kluwer.
Burkhardt, F.; Ajmera, J.; Englert, R.; Burleson, W.; Stegmann, J. (2006). Detecting anger in automated voice portal dialogues. Proc. Interspeech 2006, ISCA, Pittsburgh, PA, USA.
Burkhardt, F.; van Ballegooy, M.; Englert, R.; Huber, R. (2005). An emotion-aware voice portal. Proc. 16. Conference for Electronic Speech Signal Processing (ESSP) 2005, Prague, Czech Republic.
Canada, K.; Brusca, F. (1991). The technological gender gap: Evidence and recommendations for educators and computer-based instruction designers. Educational Technology Research & Development, vol. 39, no. 2, pp. 43–51.
Catrambone, R.; Stasko, J.; Xiao, J. (2004). ECA as user interface paradigm. In Z. Ruttkay and C. Pelachaud (Ed.), From Brows till Trust: Evaluating Embodied Conversational Agents, Kluwer.
Cerrato, L.; Falcone, M.; Paoloni, A. (2000). Subjective age estimation of telephonic voices. Speech Communication, vol. 31, no. 2–3, pp. 107–102.
Duda, R. O.; Hart, P. E.; Stork, D. G. (2000). Pattern Classification. 2nd ed., Wiley Interscience.
Fraser, J.; Gibret, G. (1991). Simulating speech systems. Computer, Speech, and Language 5, pp.81–99.
Gilly, M. C.; Zeithaml, V. A. (1985). The elderly consumer and adaptation of technologies. Journal of Consumer Research, vol. 12, pp. 353–357.
Gomez, L. M.; Egan, D. E.; Bowers, C. (1986). Learning to use a text editor: some learner characteristics that predict success. Human- Computer Interaction, vol. 2, pp. 1–23.
Günthner, Susanne (1997). Zur kommunikativen Konstruktion von Geschlechterdifferenzen im Gespräch. In Braun, F. /Pasero, U. (Eds.), Kommunikation von Geschlecht – Communication of Gender. Pfaffenweiler, Centaurus, pp. 122–146.
Hempel, T. (2006a). Usability of Telephone-Based Speech Dialogue Systems as Experienced by User Groups of Different Age and Background. In: 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Sept. 04th–06th, Berlin, Germany, International Speech Communication Association: Bonn, Germany, pp. 76–78.
Hempel, T. (2006b). Umgang von mittelalten und älteren Nutzern mit telefonbasierten Sprachdialoguesystemen. In: Usability Professionals 06/Mensch & Computer 2006 – Mensch und Computer im Strukturwandel, Sept. 3rd–6th 2006, Gelsenkirchen, Germany, University of Applied Sciences.
Kienast, M.; Paeschke, A.; Sendlmeier, W. F. (1999). Articulatory reduction in emotional speech. Proceedings Eurospeech 99, Budapest, pp. 117–120.
Krämer, N. C.; Rüggenberg, S.; Meyer zu Kniendorf, C.; Bente, G. (2002). Schnittstelle für alle? Möglichkeiten zur Anpassung anthropomorpher Interface Agenten an verschiedene Nutzergruppen. In M. Herzceg, W. Prinz & H. Oberquelle (Ed.), Mensch und Computer 2002, Teubner, Stuttgart, pp. 125–134.
Krämer, N.C. (2001). Bewegende Bewegung. Sozio-emotionale Wirkungen nonverbalen Verhaltens und deren experimentelle Untersuchung mittels Computeranimation. Lengerich, Pabst.
Lee, C.M.; Narayanan, S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), pp. 293–302.
Levin, T.; Gordon, C. (1989). Effects of gender and computer experience on attitudes toward computers. Journal of Computing Research, 5(1), pp. 69–88.
McBreen H. (2002). Embodied conversational agents in ecommerce. In Socially Intelligent Agents: Creating Relationships with Computers and Robots. Kluwer Academic Publishers.
Metze, F.; Ajmera, J.; Englert, R.; Bub, U.; Burkhardt, F.; Stegmann, J.; Müller, C.; Huber, R.; Andrassy, B.; Bauer, J. G.; Littel, B. (2007). Comparison of four approaches to age and gender recognition for telephone applications. Proc. ICASSP 2007, IEEE, Honolulu, Hawaii.
Mulac, A. (1999). Perceptions of women and men based on their linguistic behavior: The Gender-Linked Language Effect. In Pasero, U. /Braun, F. (Eds.), Perceiving and performing gender. Opladen, pp. 88–104.
Paterno, F.; Mancini, C.; Meniconi, S. (1997). ConcurTaskTrees: A diagrammatic notation for specifying task models. Proceedings Interact’97, Chapman&Hall, July, Sydney, pp. 362–369.
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, vol. 77, no. 2, February, pp. 257–286.
Reynolds, D. A.; Campbell, J. P.; Campbell, W. M.; Dunn, R. B.; Gleason, T. P.; Jones, D. A.; Quatieri, T. F.; Quillen, C. B.; Sturim, D. E.; Torres-Carrasquillo, P. A. (2003). Beyond Cepstra: Exploiting High-Level Information in Speaker Recognition. Proc. Workshop on Multimodal User Authentication in Santa Barbara, California, pp. 223–229.
Rudinger, G. (1994). Ältere Menschen und Technik. In Kastner M. (Ed.), Personalpflege: Der gesunde Mitarbeiter in einer gesunden Organisation. Quintessenz, München, pp. 187–194.
Schölkopf, B.; Smola, A. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge, MA, USA.
Sproull, L.; Subramani, M.; Kiesler, S.; Walker, J. H.; Waters, K. (1996). When the interface is a face. Human-Computer Interaction, vol. 11, pp. 97–124.
Sproull, L. S.; Kiesler, S.; Zubrow, D. (1984). Encountering an Alien Culture, Journal of Social Issues, 40(3), pp. 31–48.
Strong, E. K. Jr. (1943). Vocational interests of men and women. Stanford University Press, Stanford.
SWR (2004). Media-Analyse 2004/II. Media Perspektiven, SWR.
Walker, M.; Langkilde-Geary, I.; Wright, H.; Wright, J.; Gorin, A. (2002). Automatically training a problematic dialogue predictor for a spoken dialogue system. Journal of Artificial Intelligence Research 16, pp. 293–319.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Oberle, F. (2008). Who, Why and How Often? Key Elements for the Design of a Successful Speech Application Taking Account of the Target Groups. In: Usability of Speech Dialog Systems. Signals and Commmunication Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78343-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-78343-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78342-8
Online ISBN: 978-3-540-78343-5
eBook Packages: EngineeringEngineering (R0)