Advertisement

Introduction

  • Tobias Heinroth
  • Wolfgang Minker
Chapter

Abstract

During the past few decades, the development of Spoken Dialogue Systems (SDSs) has advanced significantly due to increasing miniaturisation of electronics combined with reduced costs. This paved the way for the development of specialised speech recognition and synthesis algorithms. The emergence of powerful mobile devices along with the increasing accessibility of the Internet has also enabled the development of a multitude of speech-related applications. In the cellular telephony arena SDSs are already quite sophisticated. Apart from information retrieval, call routing, and transactional applications, new technical support systems for customers have become more widely available. Such automated agents help callers to, for example, solve Internet-related problems or resolve technical issues with various devices. In automotive applications such as route guidance or control of entertainment systems, a plethora of spoken command systems are available in which—more or less regular—spontaneous speech is accurately understood. Two key technologies have facilitated these advancements: voice recognition (Automatic Speech Recognition—ASR) and speech synthesis (Text-to-Speech—TTS). Apart from these technologies, an SDS also performs linguistic and semantic analysis, text generation, and contains a Spoken Dialogue Manager (SDM) that constitutes the behaviour and the conversional characteristics of the system.

Keywords

Automatic Speech Recognition Speech Synthesis Dialogue Strategy Voice Recognition Evaluation Session 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Abowd, G., Atkeson, C., & Essa, I. (1998). Ubiquitous smart spaces. Technical report, DARPA.Google Scholar
  2. Axelsson, J., Cross, C., Lie, H. W., McCobb, G., Raman, T. V., & Wilson, L. (2001). Xhtml+voice profile 1.0. Technical report, W3C.Google Scholar
  3. Bachmann, P. (1894). Die analytische Zahlentheorie, vol. 2. Leipzig: Teubner.MATHGoogle Scholar
  4. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.MathSciNetMATHCrossRefGoogle Scholar
  5. Bechhofer, S., Volz, R., & Lord, P. (2003). Cooking the semantic web with the owl api. In The Semantic Web – ISWC 2003, (pp. 659–675). Springer.Google Scholar
  6. Bellik, Y., Pruvost, G., Martin, J.-C., Tan, N., Minker, W., & Heinroth, T. (2010). D16 – user interaction adaptation component. Confidential deliverable, The ATRACO Project (FP7/2007–2013 grant agreement no:216837).Google Scholar
  7. Berton, A., Bühler, D., & Minker, W. (2006). SmartKom-Mobile Car: User Interaction with Mobile Services in a Car Environment (SmartKom: Foundations of Multi-Modal Dialogue Systems ed.)., (pp. 523–541). Cognitive Technologies. Heidelberg: Springer.Google Scholar
  8. Beslay, L., & Hakala, H. (2007). Digital territory: Bubbles. In P. T. Kidd (Ed.), European visions for the knowledge age: a quest for new horizons in the information society. Cheshire Henbury.Google Scholar
  9. Bezold, M. (2011). Adapting Multimodal Dialogue Systems to User Behaviour. PhD thesis, Ulm University.Google Scholar
  10. Bidot, J., Goumopoulos, C., & Calemis, I. (2011). Using ai planning and late binding for managing service workflows in intelligent environments. In Proc. of the International Conference on Pervasive Computing and Communications (PerCom), (pp. 156–163). IEEE.Google Scholar
  11. Black, A. W., Burger, S., Conkie, A., Hastie, H. W., Keizer, S., Lemon, O., Merigaud, N., Parent, G., Schubiner, G., Thomson, B., Williams, J. D., Yu, K., Young, S., & Eskenazi, M. (2011). Spoken dialog challenge 2010: Comparison of live and control test results. In SIGDIAL Conference, (pp. 2–7).Google Scholar
  12. Bohlin, P., Bos, J., Larsson, S., Lewin, I., Matheson, C., & Milward, D. (1999). Survey of existing interactive systems – trindi deliverable d1.3. Technical report, Gothenburg University.Google Scholar
  13. Bohus, D., Raux, A., Harris, T. K., Eskenazi, M., & Rudnicky, E. I. (2007). Olympus: an open-source framework for conversational spoken language interface research. In HLT-NAACL 2007 workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technology.Google Scholar
  14. Bohus, D., & Rudnicky, A. (2002). Integrating multiple knowledge sources for utterance-level confidence annotation in the cmu communicator spoken dialog system. Technical report, Roots in the Town. In 2nd International Workshop on Community Networking. 1995. Princeton, NJ: IEEE Communications SocietyGoogle Scholar
  15. Bohus, D., & Rudnicky, A. (2005). Sorry i didn’t catch that: An investigation of non-understanding errors and recovery strategies. In Proceedings of SIGdial-2005, Lisbon, Portugal.Google Scholar
  16. Bohus, D., & Rudnicky, A. I. (2009). The ravenclaw dialog management framework: Architecture and systems. Computer Speech & Language, 23, 332–361.CrossRefGoogle Scholar
  17. Brown, M., Burnett, D., Candell, E., Carter, J., Dahl, D., Ghosh, D., Hunt, A., Krause, S., Lerner, S., Lucas, B., Marschner, J., McGlashan, S., Normandin, Y., Porter, B., Raggett, D., Ramsthaler, D., Tichelen, L. V., Wang, K., & Werner, L. (2004). Speech recognition grammar specification version 1.0. Technical report, W3C.Google Scholar
  18. Bühler, D. (2009). Towards Domain-driven Dialogue - Application Control and Problem Solving. PhD thesis, Ulm University.Google Scholar
  19. Burkhardt, F., Huber, R., & Batliner, A. (2007). Application of speaker classification in human machine dialog systems. In Speaker Classification I: Fundamentals, Features, and Methods, (pp. 174–179). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
  20. Burkhardt, F., Metze, F., & Stegmann, J. (2008). Speaker classification for next-generation voice-dialog systems, (pp. 497–528). Wiley.Google Scholar
  21. Cáceres, M. (2011). Widget packaging and configuration (working draft). Technical report, W3C.Google Scholar
  22. Chin, J., Diehl, V., & Norman, K. (1988). Development of an instrument measuring user satisfaction of the human–computer interface. In Proceedings of ACM CHI 88 Conference on Human Factors in Computing, (pp. 213–218).Google Scholar
  23. Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2, 113–124.MATHCrossRefGoogle Scholar
  24. Chung, G., Seneff, S., Wang, C., & Hetherington, L. (2004). A dynamic vocabulary spoken dialogue interface. In Proc. ICSLP, (pp. 1457–1460).Google Scholar
  25. Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13(2), 259–294.CrossRefGoogle Scholar
  26. Colmerauer, A., & Roussel, P. (1996). The birth of prolog. In T. J. Bergin, Jr., & R. G. Gibson, Jr. (Eds.), History of programming languages—II (pp. 331–367). New York, NY, USA: ACM.CrossRefGoogle Scholar
  27. Cook, D., Youngblood, M., & Das, S. (2006). A multi-agent approach to controlling a smart environment. In J. Augusto and C. Nugent (Eds.), Designing Smart Homes, vol. 4008 of Lecture Notes in Computer Science (pp. 165–182). Heidelberg: Springer.CrossRefGoogle Scholar
  28. Cornelius, R. (1996). The science of emotion : research and tradition in the psychology of emotions. Upper Saddle River, NJ, USA: Prentice Hall.Google Scholar
  29. Coutaz, J., Crowley, J., Dobson, S., & Garlan, D. (2005). Context is key. Communications of the ACM, 48(3), 49–53.CrossRefGoogle Scholar
  30. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human–computer interaction. Signal Processing Magazine, 18(1), 32–80.CrossRefGoogle Scholar
  31. Daniels, J. (2000). Integrating a spoken language system with agents for operational information access. In AAAI, (pp. 1002–1007).Google Scholar
  32. Dervin, B., Foreman-Wernet, L., & Lauterbach, E. (2003). Sense-making methodology reader: Selected writings of Brenda Dervin. Hampton Press Inc.Google Scholar
  33. Dretske, F. (1991). Explaining behavior: Reasons in a world of causes. Cambridge, MA, USA: MIT.Google Scholar
  34. Duong, T., Bui, H., Phung, D., & Venkatesh, S. (2005). Activity recognition and abnormality detection with the switching hidden semi-markov model. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, (pp. 838–845). IEEE.Google Scholar
  35. Fahrmeir, L., Hamerle, A., & Tutz, G. (1984). Multivariate statistische Verfahren. New York: Walter de Gruyter.MATHGoogle Scholar
  36. Ferguson, G., Allen, J., Blaylock, N., Byron, D., Chambers, N., Dzikovska, M., Galescu, L., Shen, X., Swier, R., & Swift, M. (2002). The Medication Advisor Project: Preliminary report. Technical Report TR776, University of Rochester Computer Science Department.Google Scholar
  37. Fowler, M. (2006). Passive view.Google Scholar
  38. Franke, J., Daniels, J., & McFarlane, D. (2002). Recovering context after interruption. In CogSci’02, (pp. 310–315).Google Scholar
  39. Garrett, J. J. (2005). Ajax: A new approach to web applications. http://adaptivepath.com/ideas/essays/archives/000385.php.
  40. Gervasio, M., & Murdock, J. (2009). What were you thinking?: filling in missing dataflow through inference in learning from demonstration. In Proceedings of the 14th international conference on Intelligent user interfaces, (pp. 157–166). ACM.Google Scholar
  41. Gil, Y., & Ratnakar, V. (2008). Towards intelligent assistance for to-do lists. In Proceedings of the 13th international conference on Intelligent user interfaces, (pp. 329–332). ACM.Google Scholar
  42. Ginzburg, J., & Cooper, R. (2004). Clarification, ellipsis, and the nature of contextual updates in dialogue. Linguistics and Philosophy, 27(3), 297–365.CrossRefGoogle Scholar
  43. Gnjatović, M., & Rösner, D. (2008). Adaptive dialogue management in the nimitek prototype system. In Proceedings of the 4th IEEE PIT workshop, (pp. 14–25). Berlin, Heidelberg: Springer.Google Scholar
  44. Goumopoulos, C., & Kameas, A. (2009). Ambient ecologies in smart homes. The Computer Journal, 52(8), 922–937.CrossRefGoogle Scholar
  45. Habibi, M., Rahbar, S., & Sameti, H. (2010). Divided pomdp method for complex menu problems in spoken dialogue systems. In Spoken Language Technology Workshop (SLT), 2010 IEEE, (pp. 484–489). IEEE.Google Scholar
  46. Hamp, B., & Feldweg, H. (1997). Germanet – a lexical-semantic net for german. In Proceedings of ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, (pp. 9–15). Citeseer.Google Scholar
  47. Heinroth, T., & Denich, D. (2011). Spoken Interaction within the Computed World: Evaluation of a Multitasking Adaptive Spoken Dialogue System. In 35th Annual IEEE International Computer Software and Applications Conference (COMPSAC 2011). IEEE.Google Scholar
  48. Heinroth, T., Denich, D., & Schmitt, A. (2010). Owlspeak - adaptive spoken dialogue within intelligent environments. In 8th IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), (pp. 666 – 671). Mannheim, Germany.Google Scholar
  49. Heinroth, T., Grotz, M., Nothdurft, F., & Minker, W. (2012). Adaptive speech recognition for intuitive model-based spoken dialogues. In Proceedings of the Eighth Conference on International Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA).Google Scholar
  50. Heinroth, T., Koleva, S., & Minker, W. (2011). Topic switching strategies for spoken dialogue systems. In Proc. of the 12th Annual Conference of the International Speech Communication Association.Google Scholar
  51. Heinroth, T., & Minker, W. (Eds.). (2011). Next Generation Intelligent Environments: Ambient Adaptive Systems. Boston, USA: Springer.Google Scholar
  52. Herm, O., Schmitt, A., & Liscombe, J. (2008). When calls go wrong: How to detect problematic calls based on log-files and emotions? In Proc. of the International Conference on Speech and Language Processing (ICSLP).Google Scholar
  53. Hildebrand, A., & Sá, V. (2000). Embassi: electronic multimedia and service assistance. In oceedings of the Internet Measurement Conference (IMC), (pp. 50–59).Google Scholar
  54. Hone, K. S., & Graham, R. (2000). Towards a tool for the subjective assessment of speech system interfaces (sassi). Natural Language Engineering, 6, 287–305.CrossRefGoogle Scholar
  55. Horridge, M., Bechhofer, S., & Noppens, O. (2007). Igniting the owl 1.1 touch paper: The owl api. In Proc. OWL-ED, vol. 258.Google Scholar
  56. Huerta, J. M. (2000). Robust Speech Recognition in GSM Mobile Environments. PhD thesis, Carnegie Mellon University.Google Scholar
  57. Hunt, A. (2000). Jspeech grammar format. W3C Note http://www.w3.org/TR/jsgf/.
  58. Intille, S. S., Larson, K., Beaudin, J. S., Tapia, M., Kaushik, P., Nawyn, J., and Mcleish, T. J. (2005). The placelab: a live-in laboratory for pervasive computing research (video. In Proceedings of Pervasive 2005 Video Program.Google Scholar
  59. ISO (2008). Iso/iec 29341–2:2008 information technology – upnp device architecture – part 2: Basic device control protocol - basic device. Technical report, INTERNATIONAL ORGANIZATION FOR STANDARDIZATION.Google Scholar
  60. ITU (2005). Parameters describing the interaction with spoken dialogue systems. ITU-T Recommendation Supplement 24 to P-Series, International Telecommunication Union, Geneva, Switzerland. Based on ITU-T Contr. COM 12–17 (2009).Google Scholar
  61. Jiang, H. (2005). Confidence measures for speech recognition: A survey. Speech Communication, 45(4), 455–470.CrossRefGoogle Scholar
  62. Johnston, M., Baggia, P., Burnett, D., Carter, J., Dahl, D., & McCobb, G. (2009). Emma: Extensible multimodal annotation markup language; World Wide Web Consortium Recommendation REC-emma-2009021. Technical report, W3C.Google Scholar
  63. Jokinen, K., Kerminen, A., Kaipainen, M., Jauhiainen, T., Wilcock, G., Turunen, M., Hakulinen, J., Kuusisto, J., & Lagus, K. (2002). Adaptive dialogue systems-interaction with interact. In Proceedings of the 3rd SIGdial workshop on Discourse and dialogue-Volume 2, (pp. 64–73). ACL.Google Scholar
  64. Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (Prentice Hall Series in Artificial Intelligence) (1st ed.). Prentice Hall.Google Scholar
  65. Kaelbling, L., Littman, M., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134.MathSciNetMATHCrossRefGoogle Scholar
  66. Kientz, J. A., Patel, S. N., Jones, B., Price, E., Mynatt, E. D., & Abowd, G. D. (2008). The georgia tech aware home. In CHI ’08 extended abstracts on Human factors in computing systems, CHI EA ’08, (pp. 3675–3680). New York, NY, USA: ACM.Google Scholar
  67. Kleene, S. (1988). Introduction to metamathematics. Wolters-Noordhoff.Google Scholar
  68. Knuth, D. E. (1964). Backus normal form vs. Backus Naur form. Communications of the ACM, 7(12), 735–736.Google Scholar
  69. Konings, B., & Schaub, F. (2011). Territorial privacy in ubiquitous computing. In Wireless On-Demand Network Systems and Services (WONS), 2011 Eighth International Conference on, (pp. 104–108). IEEE.Google Scholar
  70. Könings, B., Wiedersheim, B., & Weber, M. (2011). Privacy & trust in ambient intelligence environments. In W. Minker and T. Heinroth (Eds.), Next Generation Intelligent Environments (pp. 227–252). New York: Springer.CrossRefGoogle Scholar
  71. Krasner, G., & Pope, S. (1998). A cookbook for using the model-view-controller user interface paradigm in smalltalk-80. Journal of Object-Oriented Programming, 1(3), 26–49.Google Scholar
  72. Kruskal, W., & Wallis, W. (1952). Use of ranks in one-criterion variance analysis. Journal of the American statistical Association, 47(260), 583–621.MATHCrossRefGoogle Scholar
  73. Larsson, S. (2002). Issue-based Dialogue Management. PhD thesis, Göteborg University, Sweden.Google Scholar
  74. Larsson, S., & Traum, D. (2000). Information state and dialogue management in the trindi dialogue move engine. Natural Language Engineering Special Issue, 6, 323–340.CrossRefGoogle Scholar
  75. Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710.MathSciNetGoogle Scholar
  76. Lewis, J. R. (1995). Ibm computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. International Journal of Human–Computer Interaction, 7(1), 57–78.Google Scholar
  77. Limbourg, Q., Vanderdonckt, J., Michotte, B., Bouillon, L., & López-Jaquero, V. (2005). Usixml: A language supporting multi-path development of user interfaces. In 9th IFIP Working Conference on Engineering for Human–Computer Interaction, (pp. 134–135). Springer.Google Scholar
  78. Litman, D., & Pan, S. (2002). Designing and evaluating an adaptive spoken dialogue system. User Modeling and User-Adapted Interaction, 12(2), 111–137.MATHCrossRefGoogle Scholar
  79. Lockwood, S., & Cook, D. (2008). Computer, light on! In The 4th IET International Conference on Intelligent Environments, Seattle, USA.Google Scholar
  80. López-Cózar, R., & Callejas, Z. (2006). Two-level speech recognition to enhance the performance of spoken dialogue systems. Knowledge-Based Systems, 19(3), 153–163.CrossRefGoogle Scholar
  81. López-Cózar, R., & Callejas, Z. (2008). Asr post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information. Speech Communication, 50(8–9), 745–766.CrossRefGoogle Scholar
  82. López-Cózar, R., & Callejas, Z. (2010). Multimodal dialogue for ambient intelligence and smart environments, chapter 21, (pp. 559–579). Springer.Google Scholar
  83. Mankiewicz, R. (2000). The story of mathematics. Princeton: Princeton University Press.MATHGoogle Scholar
  84. Mann, H., & Whitney, D. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 18(1), 50–60.MathSciNetMATHCrossRefGoogle Scholar
  85. McFarlane, D. (2002). Comparison of four primary methods for coordinating the interruption of people in human–computer interaction. Human–Computer Interaction, 17, 63–139.Google Scholar
  86. McGuinness, D. L., & van Harmelen, F. (2004). Owl web ontology language. Technical report, W3C.Google Scholar
  87. McTear, M. (2004). Spoken Dialogue Technology: Toward the Conversational User Interface. London: Springer.CrossRefGoogle Scholar
  88. McTear, M., O’Neill, I., Hanna, P., Liu, X., McTear, M., O’Neill, I., Hanna, P., & Liu, X. (2005). Handling errors and determining confirmation strategies–an object-based approach. Speech Communication, 45(3), 249–269. Special Issue on Error Handling in Spoken Dialogue Systems.Google Scholar
  89. Metze, F., Englert, R., Bub, U., Burkhardt, F., & Stegmann, J. (2008). Getting closer: tailored human–computer speech dialog. Universal Access in the Information Society, 8, 97–108.CrossRefGoogle Scholar
  90. Miller, G. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review, 63(2), 81–97.CrossRefGoogle Scholar
  91. Miller, G. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.CrossRefGoogle Scholar
  92. Minker, W., López-Cózar, R., & McTear, M. (2009). The role of spoken language dialogue interaction in intelligent environments. Journal of Ambient Intelligence and Smart Environments, 1(1), 31–36.Google Scholar
  93. Montoro, G., Alamán, X., & Haya, P. A. (2004). Spoken interaction in intelligent environments: A working system. In Advances in Pervasive Computing.Google Scholar
  94. Mozer, M. C. (2005). Lessons from an Adaptive Home, (pp. 271–294). Wiley.Google Scholar
  95. Nakano, M., Miyazaki, N., Hirasawa, J.-i., Dohsaka, K., & Kawabata, T. (1999). Understanding unsegmented user utterances in real-time spoken dialogue systems. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, ACL ’99, (pp. 200–207). Stroudsburg, PA, USA: ACL.Google Scholar
  96. Nevin, B., & Johnson, S. (2002). The legacy of Zellig Harris: language and information into the 21st century. John Benjamins Publishing Company.Google Scholar
  97. Niezen, G., van der Vlist, B., Hu, J., & Feijs, L. (2010). From events to goals: Supporting semantic interaction in smart environments. In 2010 IEEE Symposium on Computers and Communications (ISCC), (pp. 1029–1034). IEEE.Google Scholar
  98. Nuance (2008). Nuance speech recognition system version 8.5 grammar developer’s guide. Technical report, Nuance Communications. visited 05.09.2010.Google Scholar
  99. Oh, A. H., & Rudnicky, A. I. (2000). Stochastic language generation for spoken dialogue systems. In Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems - Volume 3, ANLP/NAACL-ConvSyst ’00, (pp. 27–32). Stroudsburg, PA, USA: ACL.CrossRefGoogle Scholar
  100. Oshry, M., Auburn, R., Baggia, P., Bodell, M., Burke, D., Burnett, D. C., Candell, E., Carter, J., McGlashan, S., Lee, A., Porter, B., & Rehor, K. (2007). Voice extensible markup language (voicexml) 2.1. Technical report, W3C.Google Scholar
  101. Paternò, F., Mancini, C., & Meniconi, S. (1997). Concurtasktrees: A diagrammatic notation for specifying task models. In Proceedings of the IFIP TC13 Interantional Conference on Human–Computer Interaction, (pp. 362–369).Google Scholar
  102. Pittermann, J. (2008). Speech-Emotion Recognition in Adaptive Dialogue Systems. PhD thesis, Ulm University.Google Scholar
  103. Pittermann, J., Pittermann, A., & Minker, W. (2009). Handling Emotions in Human–Computer Dialogues. Dordrecht, The Netherlands: Springer.Google Scholar
  104. Plutchik, R. (1980). Emotion: A Psychoevolutionary Synthesis. New York, USA: Harper & Row.Google Scholar
  105. Potel, M. (1996). MVP: Model-View-Presenter The Taligent Programming Model for C +  + and Java. Technical report, Taligent Inc.Google Scholar
  106. Pruvost, G., Heinroth, T., Bellik, Y., & Minker, W. (2011). Next Generation Intelligent Environments: Ambient Adaptive Systems, chapter 5, (pp. 151–192). Springer.Google Scholar
  107. Puerta, A., & Eisenstein, J. (2002). Ximl: a common representation for interaction data. In Proceedings of the 7th International Conference on Intelligent User Interfaces, (pp. 214–215). ACM.Google Scholar
  108. Qu, Y. (2001). A Constraint-Based Model of Mixed-Initiative Dialogue in Information-Seeking Interactions. PhD thesis, School of Computer Science, Carnegie Mellon University.Google Scholar
  109. Qu, Y. (2002). A constraint-based approach for cooperative information-seeking dialog. In Proc. INLG.Google Scholar
  110. Quesada, J. F., Garcia, F., Sena, E., Bernal, J. A., & Amores, G. (2001). Dialogue management in a home machine environment: Linguistic components over an agent architecture. Procesamiento del Lenguaje Natural, 27, 89–96.Google Scholar
  111. Raux, A., & Eskenazi, M. (2007). A multi-layer architecture for semi-synchronous event-driven dialogue management. In ASRU. IEEE Workshop on Automatic Speech Recognition Understanding, (pp. 514–519).Google Scholar
  112. Reenskaug, T. (1979). Models - views - controllers. Technical report, Xerox PARC.Google Scholar
  113. rí Adámek, J. (2008). Theoretische Informatik (lecture notes). Technische Universität Braunschweig.Google Scholar
  114. Rohlicek, J., Russell, W., Roukos, S., & Gish, H. (1989). Continuous hidden Markov modeling for speaker-independent word spotting. In ICASSP’89, (pp. 627–630). IEEE.Google Scholar
  115. Román, M., Hess, C., Cerqueira, R., Campbell, R. H., & Nahrstedt, K. (2002). Gaia: A middleware infrastructure to enable active spaces. IEEE Pervasive Computing, 1, 74–83.CrossRefGoogle Scholar
  116. Ruser, H., Borodulkin, L., & Leisner, D. (2003). Multi-modal ‘smart home’ user interface. In Signals Systems Decision and Information Technology (SSD).Google Scholar
  117. Schattenberg, B., Balzer, S., & Biundo, S. (2006). Knowledge-based Middleware as an Architecture for Planning and Scheduling Systems. In Proc. of the 16th International Conference on Automated Planning and Scheduling (ICAPS-06), Ambleside, The English Lake District, UK.Google Scholar
  118. Schmitt, A., Heinroth, T., & Bertrand, G. (2009). Towards emotion, age- and gender-aware voicexml applications. In 5th International Conference on Intelligent Environments (IE’09), vol. 2 of Ambient Intelligence and Smart Environments, (pp. 34–41). IOS Press.Google Scholar
  119. Schmitt, A., & Liscombe, J. (2008). Detecting Problematic Calls With Automated Agents. In 4th IEEE Tutorial and Research Workshop Perception and Interactive Technologies for Speech-Based Systems, Irsee, Germany.Google Scholar
  120. Schmitt, A., Schatz, B., & Minker, W. (2011). Modeling and predicting quality in spoken human–computer interaction. In Proceedings of the SIGDIAL 2011 Conference, (pp. 173–184). Portland, Oregon, USA: ACL.Google Scholar
  121. Schnelle-Walka, D., & Feldes, S. (2009). Towards mixed-initiative concepts in smart environments. In Proceedings of Workshop Interacting with Smart Objects.Google Scholar
  122. Seneff, S., Hurley, E., Lau, R., Pao, C., Schmid, P., & Zue, V. (1998). Galaxy-ii: A reference architecture for conversational system development. In Proceedings of the international conference on spoken language processing, (pp. 931–934).Google Scholar
  123. Shanmugham, S., Monaco, P., & Eberman, B. (2006). A media resource control protocol (mrcp). RFC 4463 http://tools.ietf.org/html/rfc4463.
  124. Shannon, C. (1948). A mathematical theory of communication. Bell Systems Technical Journal, 27, 623–656.MathSciNetGoogle Scholar
  125. Skantze, G. (2003). Exploring human error handling strategies: Implications for spoken dialogue systems. In Proceedings of the ISCA Workshop on Error Handling in Spoken Dialogue Systems, (pp. 71–76). Citeseer.Google Scholar
  126. Sonntag, D., Engel, R., Herzog, G., Pfalzgraf, A., Pfleger, N., Romanelli, M., & Reithinger, N. (2007). SmartWeb Handheld – Multimodal Interaction with Ontological Knowledge Bases and Semantic Web Services, vol. 4451 of Lecture Notes in Computer Science, (pp. 272–295). Berlin/Heidelberg: Springer.Google Scholar
  127. Stoline, M. (1981). The status of multiple comparisons: simultaneous estimation of all pairwise comparisons in one-way anova designs. American Statistician, 35(3), 134–141.MATHGoogle Scholar
  128. Swerts, M., Litman, D., & Hirschberg, J. (2000). Corrections in spoken dialogue systems. In Proceedings of the International Conference on Spoken Language Processing, vol. 2, (pp. 615–618). Citeseer.Google Scholar
  129. Traum, D., & Larsson, S. (2003). The information state approach to dialogue management, chapter 15, (pp. 325–353). Kluwer.Google Scholar
  130. Turing, A. (1937). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(1), 230.MathSciNetCrossRefGoogle Scholar
  131. van Helvert, J., Hagras, H., & Kameas, A. (2009). D27 - prototype testing and validation (year 2). Restricted deliverable, The ATRACO Project (FP7/2007–2013 grant agreement n 216837).Google Scholar
  132. van Helvert, J., Hagras, H., Wagner, C., Dooley, J., Bacon, R., & Bilgin, A. (2011). D27 - prototype testing and validation (year 3). Restricted deliverable, The ATRACO Project (FP7/2007–2013 grant agreement n 216837).Google Scholar
  133. Van Welie, M., Van der Veer, G., & Eliëns, A. (1998). An ontology for task world models. In Proceedings of DSV-IS98, (pp. 3–5). Abingdon, UK: Springer.Google Scholar
  134. Vipperla, R., Wolters, M., Georgila, K., & Renals, S. (2009). Speech input from older users in smart environments: Challenges and perspectives. In Proceedings HCI International: Universal Access in Human–Computer Interaction. Intelligent and Ubiquitous Interaction Environments, number 5615 in Lecture Notes in Computer Science (pp. 117–126). Springer.Google Scholar
  135. Voxeo (2011). Voxeo prophecy. http://www.voxeo.com/products/.
  136. Wagner, C., & Hagras, H. (2010). D14 – artefact operation adaptation component. Confidential deliverable, The ATRACO Project (FP7/2007–2013 grant agreement n 216837).Google Scholar
  137. Walker, M., Rudnicky, A., Prasad, R., Aberdeen, J., Bratt, E., Garofolo, J., Hastie, H., Le, A., Pellom, B., Potamianos, A., et al. (2002). Darpa communicator: Cross-system results for the 2001 evaluation. In Proc. of ICSLP. Citeseer.Google Scholar
  138. Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A. (1997). Paradise: a framework for evaluating spoken dialogue agents. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics.Google Scholar
  139. Wang, K. (2002). Salt: an xml application for web-based multimodal dialog management. In Proceedings of the 2nd workshop on NLP and XML - Volume 17, (pp. 1–8).Google Scholar
  140. Ward, W., & Issar, S. (1994). Recent improvements in the cmu spoken language understanding system. In Proceedings of the workshop on Human Language Technology, HLT ’94, (pp. 213–216). Stroudsburg, PA, USA: ACL.CrossRefGoogle Scholar
  141. Warren, W. (2006). The dynamics of perception and action. Psychological review, 113(2), 358.MathSciNetCrossRefGoogle Scholar
  142. Wechsung, I., & Naumann, A. B. (2008). Evaluation methods for multimodal systems: A comparison of standardized usability questionnaires. Lecture Notes in Computer Science, 5078, 276–284.CrossRefGoogle Scholar
  143. Williams, J., & Young, S. (2007). Scaling pomdps for spoken dialog management. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 2116–2129.CrossRefGoogle Scholar
  144. Yang, F., Heeman, P., & Kun, A. (2008). Switching to real-time tasks in multi-tasking dialogue. In COLING’08, (pp. 1025–1032). ACL.Google Scholar
  145. Yang, F., Heeman, P. A., & Kun, A. L. (2011). An investigation of interruptions and resumptions in multi-tasking dialogues. Computational Linguistics, 37(1), 75–104.CrossRefGoogle Scholar
  146. Young, S. (2007). Using POMDPs for dialog management. In Spoken Language Technology Workshop, 2006. IEEE, (pp. 8–13). IEEE.Google Scholar
  147. Young, S., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., & Yu, K. (2010). The hidden information state model: A practical framework for pomdp-based spoken dialogue management. Computer Speech & Language, 24(2), 150–174.CrossRefGoogle Scholar
  148. Young, S., Williams, J., Schatzmann, J., Stuttle, M., & Weilhammer, K. (2006). D4.3: Bayes net prototype - the hidden information state dialogue manager. Technical report, TALK - Talk and Look: Tools for Ambient Linguistic Knowledge, IST-507802, 6th FP.Google Scholar
  149. Zgorzelski, A., Schmitt, A., Heinroth, T., & Minker, W. (2010). Repair strategies on trial: which error recovery do users like best? In Proc. of the International Conference on Speech and Language Processing (ICSLP).Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Tobias Heinroth
    • 1
  • Wolfgang Minker
    • 1
  1. 1.Institute of Communications EngineeringUniversity of UlmUlmGermany

Personalised recommendations