Abstract
With increasing complexity of automated telephone-based applications, we require new means to detect problems occurring in the dialog between system and user in order to support task completion. Anger and frustration are important symptoms indicating that task completion and user satisfaction may be endangered. This chapter describes extensively a variety of aspects that are relevant for performing anger detection in interactive voice response (IVR) systems and describes an anger detection system that takes into account several knowledge sources to robustly detect angry user turns. We consider acoustic, linguistic, and interaction parameter-based information that can be collected and exploited for anger detection. Further, we introduce a subcomponent that is able to estimate the emotional state of the caller based on the caller’s previous emotional state. Based on a corpus of 1,911 calls from an IVR system, we demonstrate the various aspects of angry and frustrated callers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A sequence of five consecutive user turns.
- 2.
User interrupts the system prompt by speaking.
- 3.
The term non-angry is used as an all-encompassing term for all emotions other than anger. However, since callers typically do not talk in a “happy,” “sad,” or “disgusted” manner to an IVR, “non-angry” speech contains predominantly neutral speech.
- 4.
The classifier is tested with one part of the set and trained with the remaining nine parts. This process is iterated ten times and the performance is averaged.
- 5.
The NoInput event occurs when the caller does not reply to a system question within a certain time slot.
- 6.
The NoMatch event is triggered when the ASR is unable to recognize the user utterance with help of the activated grammars.
References
Acomb, K., Bloom, J., Dayanidhi, K., Hunter, P., Krogh, P., Levin, E., and Pieraccini, R. (2007). Technical support dialog systems: issues, problems, and solutions. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, pages 25–31. Rochester, NY: Association for Computational Linguistics.
Batliner, A., Fischer, K., Huber, R., Spilker, J., and Nöth, E. (2000). Desperately seeking emotions: actors, wizards, and human beings. In Cowie, R., Douglas-Cowie, E., and Schröder, M., editors, Proceedings of the ISCA Workshop on Speech and Emotion, pages 195–200.
Bennett, K. P. and Campbell, C. (2000). Support vector machines: hype or hallelujah? Journal of SIGKDD Explorations, 2(2):1–13.
Bizacumen Inc. (2009). Interactive voice response (IVR) systems – an international market report. Market study, Bizacumen Inc.
Burkhardt, F., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005). A database of German emotional speech. In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2005, ISCA, pages 1517–1520.
Cohen, W. W. and Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the 16th National Conference on Artificial Intelligence, pages 335–342. Menlo Park, CA: AAAI Press.
Davies, M. and Fleiss, J. (1982). Measuring agreement for multinomial data. Biometrics, 38:1047–1051.
Enberg, I. S. and Hansen, A. V. (1996). Documentation of the Danish emotional speech database. Technical report, Aalborg University, Denmark.
Gorin, A. L., Riccardi, G., and Wright, J. H. (1997). How may I help you? Journal of Speech Communication, 23(1–2):113–127.
Herm, O., Schmitt, A., and Liscombe, J. (2008). When calls go wrong: how to detect problematic calls based on log-files and emotions? In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2008, pages 463–466.
Kim, W. (2007). Online call quality monitoring for automating agentbased call centers. In Proceedings of the International Conference on Speech and Language Processing (ICSLP).
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
Langkilde, I., Walker, M., Wright, J., Gorin, A., and Litman, D. (1999). Automatic prediction of problematic human-computer dialogues in how may I help you. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU99, pages 369–372.
Lee, C. M. and Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2):293–303.
Levin, E. and Pieraccini, R. (2006). Value-based optimal decision for dialog systems. In Proceedings of Spoken Language Technology Workshop 2006, pages 198–201.
Paek, T. and Horvitz, E. (2004). Optimizing automated call routing by integrating spoken dialog models with queuing models. In HLT-NAACL, pages 41–48.
Pieraccini, R. and Huerta, J. (2005). Where do we go from here? Research and commercial spoken dialog systems. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialog, pages 1–10.
Polzehl, T., Schmitt, A., and Metze, F. (2009). Comparing features for acoustic anger classification in German and English IVR portals. In First International Workshop on Spoken Dialogue Systems (IWSDS).
Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., and Metze, F. (2009). Emotion classification in children’s speech using fusion of acoustic and linguistic features. In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2009.
Rabiner, L. R. (1990). A tutorial on hidden Markov models and selected applications in speech recognition. San Francisco, CA: Morgan Kaufmann.
Schmitt, A., Hank, C., and Liscombe, J. (2008). Detecting problematic calls with automated agents. In 4th IEEE Tutorial and Research Workshop Perception and Interactive Technologies for Speech-Based Systems, Irsee, Germany.
Schmitt, A., Heinroth, T., and Liscombe, J. (2009). On nomatchs, noinputs and barge-ins: do non-acoustic features support anger detection? In Proceedings of the 10th Annual SIGDIAL Meeting on Discourse and Dialogue, SigDial Conference 2009, London, UK: Association for Computational Linguistics.
Schuller, B. (2006). Automatische Emotionserkennung aus sprachlicher und manueller Interaktion. Dissertation, Technische Universit¨at München, München.
van den Bosch, A., Krahmer, E., and Swerts, M. (2001). Detecting problematic turns in human-machine interactions: rule-induction versus memory-based learning approaches. In ACL’01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 82–89, Morristown, NJ: Association for Computational Linguistics.
Walker, M. A., Langkilde-Geary, I., Hastie, H. W., Wright, J., and Gorin, A. (2002). Automatically training a problematic dialogue predictor for a spoken dialogue system. Journal of Artificial Intelligence Research, 16:293–319.
Zweig, G., Siohan, O., Saon, G., Ramabhadran, B., Povey, D., Mangu, L., and Kingsbury, B. (2006). Automated quality monitoring in the call center with ASR and maximum entropy. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2006, volume 1, pages 1–12.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Schmitt, A., Pieraccini, R., Polzehl, T. (2010). “For Heaven’s Sake, Gimme a Live Person!” Designing Emotion-Detection Customer Care Voice Applications in Automated Call Centers. In: Neustein, A. (eds) Advances in Speech Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5951-5_9
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5951-5_9
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5950-8
Online ISBN: 978-1-4419-5951-5
eBook Packages: EngineeringEngineering (R0)