“For Heaven’s Sake, Gimme a Live Person!” Designing Emotion-Detection Customer Care Voice Applications in Automated Call Centers

Schmitt, Alexander; Pieraccini, Roberto; Polzehl, Tim

doi:10.1007/978-1-4419-5951-5_9

Alexander Schmitt²,
Roberto Pieraccini &
Tim Polzehl

1439 Accesses
4 Citations

Abstract

With increasing complexity of automated telephone-based applications, we require new means to detect problems occurring in the dialog between system and user in order to support task completion. Anger and frustration are important symptoms indicating that task completion and user satisfaction may be endangered. This chapter describes extensively a variety of aspects that are relevant for performing anger detection in interactive voice response (IVR) systems and describes an anger detection system that takes into account several knowledge sources to robustly detect angry user turns. We consider acoustic, linguistic, and interaction parameter-based information that can be collected and exploited for anger detection. Further, we introduce a subcomponent that is able to estimate the emotional state of the caller based on the caller’s previous emotional state. Based on a corpus of 1,911 calls from an IVR system, we demonstrate the various aspects of angry and frustrated callers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A sequence of five consecutive user turns.
2.
User interrupts the system prompt by speaking.
3.
The term non-angry is used as an all-encompassing term for all emotions other than anger. However, since callers typically do not talk in a “happy,” “sad,” or “disgusted” manner to an IVR, “non-angry” speech contains predominantly neutral speech.
4.
The classifier is tested with one part of the set and trained with the remaining nine parts. This process is iterated ten times and the performance is averaged.
5.
The NoInput event occurs when the caller does not reply to a system question within a certain time slot.
6.
The NoMatch event is triggered when the ASR is unable to recognize the user utterance with help of the activated grammars.

References

Acomb, K., Bloom, J., Dayanidhi, K., Hunter, P., Krogh, P., Levin, E., and Pieraccini, R. (2007). Technical support dialog systems: issues, problems, and solutions. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, pages 25–31. Rochester, NY: Association for Computational Linguistics.
Google Scholar
Batliner, A., Fischer, K., Huber, R., Spilker, J., and Nöth, E. (2000). Desperately seeking emotions: actors, wizards, and human beings. In Cowie, R., Douglas-Cowie, E., and Schröder, M., editors, Proceedings of the ISCA Workshop on Speech and Emotion, pages 195–200.
Google Scholar
Bennett, K. P. and Campbell, C. (2000). Support vector machines: hype or hallelujah? Journal of SIGKDD Explorations, 2(2):1–13.
Article Google Scholar
Bizacumen Inc. (2009). Interactive voice response (IVR) systems – an international market report. Market study, Bizacumen Inc.
Google Scholar
Burkhardt, F., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005). A database of German emotional speech. In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2005, ISCA, pages 1517–1520.
Google Scholar
Cohen, W. W. and Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the 16th National Conference on Artificial Intelligence, pages 335–342. Menlo Park, CA: AAAI Press.
Google Scholar
Davies, M. and Fleiss, J. (1982). Measuring agreement for multinomial data. Biometrics, 38:1047–1051.
Article MATH Google Scholar
Enberg, I. S. and Hansen, A. V. (1996). Documentation of the Danish emotional speech database. Technical report, Aalborg University, Denmark.
Google Scholar
Gorin, A. L., Riccardi, G., and Wright, J. H. (1997). How may I help you? Journal of Speech Communication, 23(1–2):113–127.
Article Google Scholar
Herm, O., Schmitt, A., and Liscombe, J. (2008). When calls go wrong: how to detect problematic calls based on log-files and emotions? In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2008, pages 463–466.
Google Scholar
Kim, W. (2007). Online call quality monitoring for automating agentbased call centers. In Proceedings of the International Conference on Speech and Language Processing (ICSLP).
Google Scholar
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.
Article MathSciNet MATH Google Scholar
Langkilde, I., Walker, M., Wright, J., Gorin, A., and Litman, D. (1999). Automatic prediction of problematic human-computer dialogues in how may I help you. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU99, pages 369–372.
Google Scholar
Lee, C. M. and Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2):293–303.
Article Google Scholar
Levin, E. and Pieraccini, R. (2006). Value-based optimal decision for dialog systems. In Proceedings of Spoken Language Technology Workshop 2006, pages 198–201.
Google Scholar
Paek, T. and Horvitz, E. (2004). Optimizing automated call routing by integrating spoken dialog models with queuing models. In HLT-NAACL, pages 41–48.
Google Scholar
Pieraccini, R. and Huerta, J. (2005). Where do we go from here? Research and commercial spoken dialog systems. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialog, pages 1–10.
Google Scholar
Polzehl, T., Schmitt, A., and Metze, F. (2009). Comparing features for acoustic anger classification in German and English IVR portals. In First International Workshop on Spoken Dialogue Systems (IWSDS).
Google Scholar
Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., and Metze, F. (2009). Emotion classification in children’s speech using fusion of acoustic and linguistic features. In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2009.
Google Scholar
Rabiner, L. R. (1990). A tutorial on hidden Markov models and selected applications in speech recognition. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Schmitt, A., Hank, C., and Liscombe, J. (2008). Detecting problematic calls with automated agents. In 4th IEEE Tutorial and Research Workshop Perception and Interactive Technologies for Speech-Based Systems, Irsee, Germany.
Google Scholar
Schmitt, A., Heinroth, T., and Liscombe, J. (2009). On nomatchs, noinputs and barge-ins: do non-acoustic features support anger detection? In Proceedings of the 10th Annual SIGDIAL Meeting on Discourse and Dialogue, SigDial Conference 2009, London, UK: Association for Computational Linguistics.
Google Scholar
Schuller, B. (2006). Automatische Emotionserkennung aus sprachlicher und manueller Interaktion. Dissertation, Technische Universit¨at München, München.
Google Scholar
van den Bosch, A., Krahmer, E., and Swerts, M. (2001). Detecting problematic turns in human-machine interactions: rule-induction versus memory-based learning approaches. In ACL’01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 82–89, Morristown, NJ: Association for Computational Linguistics.
Google Scholar
Walker, M. A., Langkilde-Geary, I., Hastie, H. W., Wright, J., and Gorin, A. (2002). Automatically training a problematic dialogue predictor for a spoken dialogue system. Journal of Artificial Intelligence Research, 16:293–319.
MATH Google Scholar
Zweig, G., Siohan, O., Saon, G., Ramabhadran, B., Povey, D., Mangu, L., and Kingsbury, B. (2006). Automated quality monitoring in the call center with ASR and maximum entropy. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2006, volume 1, pages 1–12.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Information Technology at Ulm University, Albert-Einstein-Allee 43, 89081, Ulm, Germany
Alexander Schmitt (Scientific Researcher)

Authors

Alexander Schmitt
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Pieraccini
View author publications
You can also search for this author in PubMed Google Scholar
Tim Polzehl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Schmitt .

Editor information

Editors and Affiliations

Linguistic Technology Systems, Palisade Ave. 800, Fort Lee, 07024, New Jersey, USA
Amy Neustein

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schmitt, A., Pieraccini, R., Polzehl, T. (2010). “For Heaven’s Sake, Gimme a Live Person!” Designing Emotion-Detection Customer Care Voice Applications in Automated Call Centers. In: Neustein, A. (eds) Advances in Speech Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5951-5_9

Download citation

DOI: https://doi.org/10.1007/978-1-4419-5951-5_9
Published: 13 August 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5950-8
Online ISBN: 978-1-4419-5951-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics