Skip to main content

“For Heaven’s Sake, Gimme a Live Person!” Designing Emotion-Detection Customer Care Voice Applications in Automated Call Centers

  • Chapter
  • First Online:
Advances in Speech Recognition

Abstract

With increasing complexity of automated telephone-based applications, we require new means to detect problems occurring in the dialog between system and user in order to support task completion. Anger and frustration are important symptoms indicating that task completion and user satisfaction may be endangered. This chapter describes extensively a variety of aspects that are relevant for performing anger detection in interactive voice response (IVR) systems and describes an anger detection system that takes into account several knowledge sources to robustly detect angry user turns. We consider acoustic, linguistic, and interaction parameter-based information that can be collected and exploited for anger detection. Further, we introduce a subcomponent that is able to estimate the emotional state of the caller based on the caller’s previous emotional state. Based on a corpus of 1,911 calls from an IVR system, we demonstrate the various aspects of angry and frustrated callers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A sequence of five consecutive user turns.

  2. 2.

    User interrupts the system prompt by speaking.

  3. 3.

    The term non-angry is used as an all-encompassing term for all emotions other than anger. However, since callers typically do not talk in a “happy,” “sad,” or “disgusted” manner to an IVR, “non-angry” speech contains predominantly neutral speech.

  4. 4.

    The classifier is tested with one part of the set and trained with the remaining nine parts. This process is iterated ten times and the performance is averaged.

  5. 5.

    The NoInput event occurs when the caller does not reply to a system question within a certain time slot.

  6. 6.

    The NoMatch event is triggered when the ASR is unable to recognize the user utterance with help of the activated grammars.

References

  1. Acomb, K., Bloom, J., Dayanidhi, K., Hunter, P., Krogh, P., Levin, E., and Pieraccini, R. (2007). Technical support dialog systems: issues, problems, and solutions. In Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, pages 25–31. Rochester, NY: Association for Computational Linguistics.

    Google Scholar 

  2. Batliner, A., Fischer, K., Huber, R., Spilker, J., and Nöth, E. (2000). Desperately seeking ­emotions: actors, wizards, and human beings. In Cowie, R., Douglas-Cowie, E., and Schröder, M., editors, Proceedings of the ISCA Workshop on Speech and Emotion, pages 195–200.

    Google Scholar 

  3. Bennett, K. P. and Campbell, C. (2000). Support vector machines: hype or hallelujah? Journal of SIGKDD Explorations, 2(2):1–13.

    Article  Google Scholar 

  4. Bizacumen Inc. (2009). Interactive voice response (IVR) systems – an international market report. Market study, Bizacumen Inc.

    Google Scholar 

  5. Burkhardt, F., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005). A database of German ­emotional speech. In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2005, ISCA, pages 1517–1520.

    Google Scholar 

  6. Cohen, W. W. and Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the 16th National Conference on Artificial Intelligence, pages 335–342. Menlo Park, CA: AAAI Press.

    Google Scholar 

  7. Davies, M. and Fleiss, J. (1982). Measuring agreement for multinomial data. Biometrics, 38:1047–1051.

    Article  MATH  Google Scholar 

  8. Enberg, I. S. and Hansen, A. V. (1996). Documentation of the Danish emotional speech database. Technical report, Aalborg University, Denmark.

    Google Scholar 

  9. Gorin, A. L., Riccardi, G., and Wright, J. H. (1997). How may I help you? Journal of Speech Communication, 23(1–2):113–127.

    Article  Google Scholar 

  10. Herm, O., Schmitt, A., and Liscombe, J. (2008). When calls go wrong: how to detect problematic calls based on log-files and emotions? In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2008, pages 463–466.

    Google Scholar 

  11. Kim, W. (2007). Online call quality monitoring for automating agentbased call centers. In Proceedings of the International Conference on Speech and Language Processing (ICSLP).

    Google Scholar 

  12. Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1):159–174.

    Article  MathSciNet  MATH  Google Scholar 

  13. Langkilde, I., Walker, M., Wright, J., Gorin, A., and Litman, D. (1999). Automatic prediction of problematic human-computer dialogues in how may I help you. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU99, pages 369–372.

    Google Scholar 

  14. Lee, C. M. and Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2):293–303.

    Article  Google Scholar 

  15. Levin, E. and Pieraccini, R. (2006). Value-based optimal decision for dialog systems. In Proceedings of Spoken Language Technology Workshop 2006, pages 198–201.

    Google Scholar 

  16. Paek, T. and Horvitz, E. (2004). Optimizing automated call routing by integrating spoken dialog models with queuing models. In HLT-NAACL, pages 41–48.

    Google Scholar 

  17. Pieraccini, R. and Huerta, J. (2005). Where do we go from here? Research and commercial spoken dialog systems. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialog, pages 1–10.

    Google Scholar 

  18. Polzehl, T., Schmitt, A., and Metze, F. (2009). Comparing features for acoustic anger classification in German and English IVR portals. In First International Workshop on Spoken Dialogue Systems (IWSDS).

    Google Scholar 

  19. Polzehl, T., Sundaram, S., Ketabdar, H., Wagner, M., and Metze, F. (2009). Emotion classification in children’s speech using fusion of acoustic and linguistic features. In Proceedings of the International Conference on Speech and Language Processing (ICSLP) Interspeech 2009.

    Google Scholar 

  20. Rabiner, L. R. (1990). A tutorial on hidden Markov models and selected applications in speech recognition. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  21. Schmitt, A., Hank, C., and Liscombe, J. (2008). Detecting problematic calls with automated agents. In 4th IEEE Tutorial and Research Workshop Perception and Interactive Technologies for Speech-Based Systems, Irsee, Germany.

    Google Scholar 

  22. Schmitt, A., Heinroth, T., and Liscombe, J. (2009). On nomatchs, noinputs and barge-ins: do non-acoustic features support anger detection? In Proceedings of the 10th Annual SIGDIAL Meeting on Discourse and Dialogue, SigDial Conference 2009, London, UK: Association for Computational Linguistics.

    Google Scholar 

  23. Schuller, B. (2006). Automatische Emotionserkennung aus sprachlicher und manueller Interaktion. Dissertation, Technische Universit¨at München, München.

    Google Scholar 

  24. van den Bosch, A., Krahmer, E., and Swerts, M. (2001). Detecting problematic turns in human-machine interactions: rule-induction versus memory-based learning approaches. In ACL’01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 82–89, Morristown, NJ: Association for Computational Linguistics.

    Google Scholar 

  25. Walker, M. A., Langkilde-Geary, I., Hastie, H. W., Wright, J., and Gorin, A. (2002). Automatically training a problematic dialogue predictor for a spoken dialogue system. Journal of Artificial Intelligence Research, 16:293–319.

    MATH  Google Scholar 

  26. Zweig, G., Siohan, O., Saon, G., Ramabhadran, B., Povey, D., Mangu, L., and Kingsbury, B. (2006). Automated quality monitoring in the call center with ASR and maximum entropy. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2006, volume 1, pages 1–12.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Schmitt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Schmitt, A., Pieraccini, R., Polzehl, T. (2010). “For Heaven’s Sake, Gimme a Live Person!” Designing Emotion-Detection Customer Care Voice Applications in Automated Call Centers. In: Neustein, A. (eds) Advances in Speech Recognition. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-5951-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-5951-5_9

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5950-8

  • Online ISBN: 978-1-4419-5951-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics