Skip to main content

Predicting Interaction Quality in Customer Service Dialogs

  • Chapter
  • First Online:
Book cover Advanced Social Interaction with Agents

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 510))

Abstract

In this paper, we apply a dialog evaluation Interaction Quality (IQ) framework to human-computer customer service dialogs. IQ framework can be used to predict user satisfaction at an utterance level in a dialog. Such a rating framework is useful for online adaptation of dialog system behavior and increasing user engagement through personalization. We annotated a dataset of 120 human-computer dialogs from two customer service application domains with IQ scores. Our inter-annotator agreement (\(\rho =0.72/0.66\)) is similar to the agreement observed on the IQ annotations of publicly available bus information corpus. The IQ prediction performance of an in-domain SVM model trained on a small set of call center domain dialogs achieves a correlation of \(\rho =0.53{/}0.56\) measured against the annotated IQ scores. A generic model built exclusively on public LEGO data achieves 94%/65% of the in-domain model’s performance. An adapted model built by extending a public dataset with a small set of dialogs in a target domain achieves 102%/81% of the in-domain model’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    barge-in feature is GENERIC but not recorded in the INTER dataset.

  2. 2.

    We use linearSVC from the sklearn package with the default parameters.

  3. 3.

    We report the results on INTER corpus using LEGO1 for training as it achieved higher scores than the models trained on LEGO2.

  4. 4.

    This result is a comparable to the result in [19] on LEGO corpus.

  5. 5.

    The heat map is drawn on a logarithmic scale.

References

  1. Beringer N, Kartal U, Louka K, Schiel F, Türk U, et al (2002) Promise–a procedure for multimodal interactive system evaluation. In: Multimodal resources and multimodal systems evaluation workshop program, Saturday, June 1, 2002, p 14

    Google Scholar 

  2. Cohen J (1968) Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 70:213–220

    Article  Google Scholar 

  3. Evanini K, Hunter P, Liscombe J, Suendermann D, Dayanidhi K, Pieraccini R (2008) Caller experience: a method for evaluating dialog systems and its automatic prediction. In: Spoken language technology workshop, 2008. SLT 2008. IEEE, pp 129–132

    Google Scholar 

  4. Hartikainen M, Salonen EP, Turunen M (2004) Subjective evaluation of spoken dialogue systems using SERVQUAL method. In: INTERSPEECH

    Google Scholar 

  5. Hone KS, Graham R (2000) Towards a tool for the subjective assessment of speech system interfaces (SASSI). Nat Lang Eng 6(3–4):287–303

    Article  Google Scholar 

  6. Pérez F, Granger BE (2007) IPython: a system for interactive scientific computing. Comput Sci Eng 9(3):21–29

    Article  Google Scholar 

  7. Pragst L, Ultes S, Minker W (2017) Recurrent neural network interaction quality estimation. Springer Singapore, Singapore, pp 381–393

    Google Scholar 

  8. Raux A, Langner B, Black A, Eskenazi M (2005) Let’s Go public! Taking a spoken dialog system to the real world. In: Proceedings of eurospeech

    Google Scholar 

  9. Reichheld FF (2004) The one number you need to grow. Harvard business review 81(12):46–54

    Google Scholar 

  10. Roy S, Mariappan R, Dandapat S, Srivastava S, Galhotra S, Peddamuthu B (2016) Qart: a system for real-time holistic quality assurance for contact center dialogues. In: Thirtieth AAAI conference on artificial intelligence

    Google Scholar 

  11. Schmitt A., Hank C., Liscombe J (2008) Detecting problematic dialogs with automated agents. In: Proceedings of the 4th IEEE tutorial and research workshop on perception and interactive technologies for speech-based systems: perception in multimodal dialogue systems. Springer, Berlin, Heidelberg, pp 72–80

    Google Scholar 

  12. Schmitt A, Schatz B, Minker W (2011) Modeling and predicting quality in spoken human-computer interaction. In: Proceedings of the SIGDIAL 2011 conference. Association for Computational Linguistics, pp 173–184

    Google Scholar 

  13. Schmitt A, Ultes S (2015) Interaction quality: assessing the quality of ongoing spoken dialog interaction by experts-and how it relates to user satisfaction. Speech Commun 74:12–36

    Article  Google Scholar 

  14. Schmitt A, Ultes S, Minker W (2012) A parameterized and annotated spoken dialog corpus of the CMU Let’s Go bus information system. In: Proceedings of the eight international conference on language resources and evaluation (LREC’2). European Language Resources Association (ELRA), Istanbul, Turkey

    Google Scholar 

  15. Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15:88–103

    Google Scholar 

  16. Suendermann D, Liscombe J, Pieraccini R (2010) Minimally invasive surgery for spoken dialog systems. In: INTERSPEECH, pp 98–101

    Google Scholar 

  17. Ultes S, Kraus M, Schmitt A, Minker W (2015) Quality-adaptive spoken dialogue initiative selection and implications on reward modelling. In: Proceedings of the 16th annual meeting of the special interest group on discourse and dialogue. Association for Computational Linguistics, Prague, Czech Republic, pp 374–383

    Google Scholar 

  18. Ultes S, Minker W (2014) Interaction quality estimation in spoken dialogue systems using hybrid-HMMs. In: Proceedings of the SIGDIAL 2014 conference, The 15th annual meeting of the special interest group on discourse and dialogue, 18–20 June 2014, Philadelphia, PA, USA, pp 208–217

    Google Scholar 

  19. Ultes S, Sánchez MJP, Schmitt A, Minker W (2015) Analysis of an extended interaction quality corpus. In: Natural language dialog systems and intelligent assistants. Springer, pp 41–52

    Google Scholar 

  20. Walker M, Kamm C, Litman D (2000) Towards developing general models of usability with paradise. Nat Lang Eng 6(3&4):363–377

    Article  Google Scholar 

  21. Walker MA, Langkilde-Geary I, Hastie HW, Wright JH, Gorin A (2002) Automatically training a problematic dialogue predictor for a spoken dialogue system. J Artif Intell Res 16(1):293–319

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Svetlana Stoyanchev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Stoyanchev, S., Maiti, S., Bangalore, S. (2019). Predicting Interaction Quality in Customer Service Dialogs. In: Eskenazi, M., Devillers, L., Mariani, J. (eds) Advanced Social Interaction with Agents . Lecture Notes in Electrical Engineering, vol 510. Springer, Cham. https://doi.org/10.1007/978-3-319-92108-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92108-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92107-5

  • Online ISBN: 978-3-319-92108-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics