Who, Why and How Often? Key Elements for the Design of a Successful Speech Application Taking Account of the Target Groups

Oberle, Frank

doi:10.1007/978-3-540-78343-5_1

Frank Oberle²

Part of the book series: Signals and Commmunication Technologies ((SCT))

756 Accesses

Abstract

Three questions have to be answered before designing a speech application: who will use it, why will they use it and how often will they use it? A designer needs answers to all of these questions to best be able to address the needs of the target group. This chapter will outline a methodical procedural model which describes the workflow required to build a speech application that is properly designed for its target groups. The workflow covers the analysis of requirements, specification, implementation, production, delivery and operation. This chapter also provides an overview of the most important information we need to describe a voice user interface, and where this information can be found. It also provides an overview of current and future technical developments in the field of speech processing and their relevance for the design of dialogues in future. We will then recommend 11 design features which, according to our experience, help the designer of a voice user interface to exploit knowledge about the user and to focus the design of the dialogue on the user’s abilities, their competence, expectations and needs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anastasi, A. (1976). Differentielle Psychologie. Vol. II, Beltz, Weinheim, 1976.
Google Scholar
Asendorpf, J.B. (2003). Person/situation (environment) assessment. In R. Fernández-Ballesteros (Ed.), Encyclopedia of Psychological Assessment. Vol. 2, London, U.K., Sage, pp. 695–698.
Google Scholar
Baltes, P. B. (1990). Entwicklungspsychologie der Lebensspanne: Theoretische Leitsätze. Psychologische Rundschau, 41, 1990, pp. 1–24.
Google Scholar
Bickmore, T.; Cassell, J. (2005). Social Dialogue with Embodied Conversational Agents. In J. van Kuppevelt, L. Dybkjaer, & N. Bernsen (Eds.), Advances in Natural, Multimodal Dialogue Systems, Springer Netherlands.
Google Scholar
Braun, F. (2004): Reden Frauen anders? Entwicklungen und Positionen in der linguistischen Geschlechterforschung. In K. Eichhoff-Cyrus (Ed.), Adam, Eva und die Sprache, Beiträge zur Geschlechterforschung. Mannheim, Dudenverlag, pp. 9–26.
Google Scholar
Buisine, S.; Abrilian, S.; Martin, JC. (2004). Evaluation of multimodal behaviour of embodied agents. In Z. Ruttkay and C. Pelachaud (Ed.), From Brows till Trust: Evaluating Embodied Conversational Agents. Kluwer.
Google Scholar
Burkhardt, F.; Ajmera, J.; Englert, R.; Burleson, W.; Stegmann, J. (2006). Detecting anger in automated voice portal dialogues. Proc. Interspeech 2006, ISCA, Pittsburgh, PA, USA.
Google Scholar
Burkhardt, F.; van Ballegooy, M.; Englert, R.; Huber, R. (2005). An emotion-aware voice portal. Proc. 16. Conference for Electronic Speech Signal Processing (ESSP) 2005, Prague, Czech Republic.
Google Scholar
Canada, K.; Brusca, F. (1991). The technological gender gap: Evidence and recommendations for educators and computer-based instruction designers. Educational Technology Research & Development, vol. 39, no. 2, pp. 43–51.
Google Scholar
Catrambone, R.; Stasko, J.; Xiao, J. (2004). ECA as user interface paradigm. In Z. Ruttkay and C. Pelachaud (Ed.), From Brows till Trust: Evaluating Embodied Conversational Agents, Kluwer.
Google Scholar
Cerrato, L.; Falcone, M.; Paoloni, A. (2000). Subjective age estimation of telephonic voices. Speech Communication, vol. 31, no. 2–3, pp. 107–102.
Google Scholar
Duda, R. O.; Hart, P. E.; Stork, D. G. (2000). Pattern Classification. 2nd ed., Wiley Interscience.
Google Scholar
Fraser, J.; Gibret, G. (1991). Simulating speech systems. Computer, Speech, and Language 5, pp.81–99.
Article Google Scholar
Gilly, M. C.; Zeithaml, V. A. (1985). The elderly consumer and adaptation of technologies. Journal of Consumer Research, vol. 12, pp. 353–357.
Article Google Scholar
Gomez, L. M.; Egan, D. E.; Bowers, C. (1986). Learning to use a text editor: some learner characteristics that predict success. Human- Computer Interaction, vol. 2, pp. 1–23.
Article Google Scholar
Günthner, Susanne (1997). Zur kommunikativen Konstruktion von Geschlechterdifferenzen im Gespräch. In Braun, F. /Pasero, U. (Eds.), Kommunikation von Geschlecht – Communication of Gender. Pfaffenweiler, Centaurus, pp. 122–146.
Google Scholar
Hempel, T. (2006a). Usability of Telephone-Based Speech Dialogue Systems as Experienced by User Groups of Different Age and Background. In: 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Sept. 04th–06th, Berlin, Germany, International Speech Communication Association: Bonn, Germany, pp. 76–78.
Google Scholar
Hempel, T. (2006b). Umgang von mittelalten und älteren Nutzern mit telefonbasierten Sprachdialoguesystemen. In: Usability Professionals 06/Mensch & Computer 2006 – Mensch und Computer im Strukturwandel, Sept. 3rd–6th 2006, Gelsenkirchen, Germany, University of Applied Sciences.
Google Scholar
Kienast, M.; Paeschke, A.; Sendlmeier, W. F. (1999). Articulatory reduction in emotional speech. Proceedings Eurospeech 99, Budapest, pp. 117–120.
Google Scholar
Krämer, N. C.; Rüggenberg, S.; Meyer zu Kniendorf, C.; Bente, G. (2002). Schnittstelle für alle? Möglichkeiten zur Anpassung anthropomorpher Interface Agenten an verschiedene Nutzergruppen. In M. Herzceg, W. Prinz & H. Oberquelle (Ed.), Mensch und Computer 2002, Teubner, Stuttgart, pp. 125–134.
Google Scholar
Krämer, N.C. (2001). Bewegende Bewegung. Sozio-emotionale Wirkungen nonverbalen Verhaltens und deren experimentelle Untersuchung mittels Computeranimation. Lengerich, Pabst.
Google Scholar
Lee, C.M.; Narayanan, S. (2005). Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), pp. 293–302.
Article Google Scholar
Levin, T.; Gordon, C. (1989). Effects of gender and computer experience on attitudes toward computers. Journal of Computing Research, 5(1), pp. 69–88.
MathSciNet Google Scholar
McBreen H. (2002). Embodied conversational agents in ecommerce. In Socially Intelligent Agents: Creating Relationships with Computers and Robots. Kluwer Academic Publishers.
Google Scholar
Metze, F.; Ajmera, J.; Englert, R.; Bub, U.; Burkhardt, F.; Stegmann, J.; Müller, C.; Huber, R.; Andrassy, B.; Bauer, J. G.; Littel, B. (2007). Comparison of four approaches to age and gender recognition for telephone applications. Proc. ICASSP 2007, IEEE, Honolulu, Hawaii.
Google Scholar
Mulac, A. (1999). Perceptions of women and men based on their linguistic behavior: The Gender-Linked Language Effect. In Pasero, U. /Braun, F. (Eds.), Perceiving and performing gender. Opladen, pp. 88–104.
Google Scholar
Paterno, F.; Mancini, C.; Meniconi, S. (1997). ConcurTaskTrees: A diagrammatic notation for specifying task models. Proceedings Interact’97, Chapman&Hall, July, Sydney, pp. 362–369.
Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, vol. 77, no. 2, February, pp. 257–286.
Google Scholar
Reynolds, D. A.; Campbell, J. P.; Campbell, W. M.; Dunn, R. B.; Gleason, T. P.; Jones, D. A.; Quatieri, T. F.; Quillen, C. B.; Sturim, D. E.; Torres-Carrasquillo, P. A. (2003). Beyond Cepstra: Exploiting High-Level Information in Speaker Recognition. Proc. Workshop on Multimodal User Authentication in Santa Barbara, California, pp. 223–229.
Google Scholar
Rudinger, G. (1994). Ältere Menschen und Technik. In Kastner M. (Ed.), Personalpflege: Der gesunde Mitarbeiter in einer gesunden Organisation. Quintessenz, München, pp. 187–194.
Google Scholar
Schölkopf, B.; Smola, A. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge, MA, USA.
Google Scholar
Sproull, L.; Subramani, M.; Kiesler, S.; Walker, J. H.; Waters, K. (1996). When the interface is a face. Human-Computer Interaction, vol. 11, pp. 97–124.
Article Google Scholar
Sproull, L. S.; Kiesler, S.; Zubrow, D. (1984). Encountering an Alien Culture, Journal of Social Issues, 40(3), pp. 31–48.
Google Scholar
Strong, E. K. Jr. (1943). Vocational interests of men and women. Stanford University Press, Stanford.
Google Scholar
SWR (2004). Media-Analyse 2004/II. Media Perspektiven, SWR.
Google Scholar
Walker, M.; Langkilde-Geary, I.; Wright, H.; Wright, J.; Gorin, A. (2002). Automatically training a problematic dialogue predictor for a spoken dialogue system. Journal of Artificial Intelligence Research 16, pp. 293–319.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

T-Systems Enterprise Service GmbH, Berlin, Germany
Frank Oberle

Authors

Frank Oberle
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Oberle, F. (2008). Who, Why and How Often? Key Elements for the Design of a Successful Speech Application Taking Account of the Target Groups. In: Usability of Speech Dialog Systems. Signals and Commmunication Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78343-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-78343-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78342-8
Online ISBN: 978-3-540-78343-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics