Skip to main content

Multimodal Dialogue System Evaluation: A Case Study Applying Usability Standards

  • Conference paper
  • First Online:
Book cover 9th International Workshop on Spoken Dialogue System Technology

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 579))

  • 337 Accesses

Abstract

This paper presents an approach to the evaluation of multimodal dialogue systems, applying usability metrics defined in ISO standards. Users’ perceptions of effectiveness, efficiency and satisfaction were correlated with various performance metrics derived from system logfiles and reference annotations. Usability experts rated questions from a preliminary 110-items questionnaire, and an assessment of their agreement on usability concepts has led to a selection of eight main factors: task completion and quality, robustness, learnability, flexibility, likeability, ease of use and usefulness (value) of an application. Based on these factors, an internally consistent and reliable questionnaire with 32 items (Cronbach’s alpha of 0.87) was produced. This questionnaire was used to evaluate the Virtual Negotiation Coaching system for metacognitive skills training in a multi-issue bargaining setting. The observed correlations between usability perception and derived performance metrics suggest that the overall system usability is determined by the quality of agreements reached, by the robustness and flexibility of the interaction, and by the quality of system responses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The usability questionnaire for a multimodal dialogue system evaluation is provided in the Appendix.

  2. 2.

    Pareto optimality reflects a state of affairs when there is no alternative state that would make any partner better off without making anyone worse off.

  3. 3.

    Positional bargaining involves holding on to a fixed preferences set regardless of the interests of others.

  4. 4.

    Examples of resources are: the Wall Street Journal WSJ0 corpus, HUB4 News Broadcast data and the VoxForge corpus.

  5. 5.

    We consider the overall negotiation task as completed if parties agreed on all four issues or parties came to the conclusion that it is impossible to reach any agreement.

  6. 6.

    Overall task quality was computed in terms of number of reward points the trainee gets at the end of each negotiation round and summing up over multiple repeated rounds; and Pareto optimality (see footnote 5).

  7. 7.

    We considered negative deals as flawed negotiation action, i.e. the sum of all reached agreements resulted in an overall negative value, meaning that the trainee made too many concessions and selected mostly dispreferred bright ‘orange’ options (see Fig. 1).

  8. 8.

    For now, this is only the general observation and the metric will be taken into consideration in future test-retest experiments.

  9. 9.

    Performance metrics related to initiative and task substituitivity aspects and their impact on the perceived usability will be an issue for future research.

  10. 10.

    System action is appropriate given the context if it introduces or continues a repair strategy.

  11. 11.

    System action is considered as correct if it addresses the user’s actions as intended and expected. These actions exclude recovery actions and error handling.

References

  1. Walker MA, Litman DJ, Kamm CA, Abella A (1997) PARADISE: a framework for evaluating spoken dialogue agents. In: Proceedings of the 8th conference on European chapter of the association for computational linguistics, pp 271–280

    Google Scholar 

  2. López-Cózar R, Callejas Z, McTear M (2006) Testing the performance of spoken dialogue systems by means of an artificially simulated user. Artif Intell Rev 26(4):291-323. Springer (2006)

    Google Scholar 

  3. Dzikovska M, Moore J, Steinhauser N, Campbell G (2011) Exploring user satisfaction in a tutorial dialogue system. In: Proceedings of the 12th annual meeting of the special interest group on discourse and dialogue (SIGdial 2011), pp 162–172

    Google Scholar 

  4. Lewis JR (1991) Psychometric evaluation of an after-scenario questionnaire for computer usability studies: the ASQ. ACM Sigchi Bull 23(1):78–81

    Article  Google Scholar 

  5. Brooke J (1996) SUS-A quick and dirty usability scale. Usability Eval Ind 189(194):4–7

    Google Scholar 

  6. Singh M, Oualil Y, Klakow D (2017) Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition. In: Proceedings of the 18th annual conference of the international speech communication association (INTERSPEECH), Stockholm, Sweden

    Google Scholar 

  7. Amanova D, Petukhova V, Klakow D (2016) Creating annotated dialogue resources: cross-domain dialogue act classification. In: Proceedings of the 9th international conference on language resources and evaluation (LREC 2016), ELRA, Paris

    Google Scholar 

  8. Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 213–218, ACM

    Google Scholar 

  9. Petukhova V, Stevens CA, de Weerd H, Taatgen N, Cnossen F, Malchanau A (2016) Modelling multi-issue bargaining dialogues: data collection, annotation design and corpus. In: Proceedings of the 9th international conference on language resources and evaluation (LREC 2016), ELRA, Paris

    Google Scholar 

  10. Nielsen J (2012) User satisfaction vs. performance metrics. Nielsen Norman Group

    Google Scholar 

  11. Danieli M, Gerbino E (1995) Metrics for evaluating dialogue strategies in a spoken language system. In: Proceedings of the 1995 AAAI spring symposium on empirical methods in discourse interpretation and generation, vol 16, pp 34–39

    Google Scholar 

  12. Walker M, Kamm C, Litman D (2000) Towards developing general models of usability with PARADISE. Nat Lang Eng 6(3–4):363–377

    Article  Google Scholar 

  13. Hone KS, Graham R (2001) Subjective assessment of speech-system interface usability. In: Proceedings of the 7th European conference on speech communication and technology

    Google Scholar 

  14. Sauro J, Dumas JS (2009) Comparison of three one-question, post-task usability questionnaires. In: Proceedings of the SIGCHI conference on human factors in computing system, ACM, pp 1599–1608

    Google Scholar 

  15. Hart SG, Staveland LE (1988) Development of NASA-TLX (task load index): results of empirical and theoretical research. Adv Psychol 52, 139–183 (Elsevier)

    Google Scholar 

  16. Fraser N (1997) Assessment of interactive systems. In: Gibbon D, Moore R, Winski R (eds) Handbook on standards and resources for spoken language systems. Mouton de Gruyter, Berlin, pp 564–615

    Google Scholar 

  17. Möller S (2004) Quality of telephone-based spoken dialogue systems. Springer Science & Business Media

    Google Scholar 

  18. Bartneck C, Kulić D, Croft E, Zoghbi S (2009) Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc Robot 1(1):71–81

    Article  Google Scholar 

  19. Linek SB, Marte B, Albert D (2008) The differential use and effective combination of questionnaires and logfiles. In: Computer-based knowledge and skill assessment and feedback in learning settings (CAF), Proceedings of the ICL

    Google Scholar 

  20. Kooijmans T, Kanda T, Bartneck C, Ishiguro H, Hagita N (2007) Accelerating robot development through integral analysis of human robot interaction. IEEE Trans Robot 23(5):1001–1012

    Article  Google Scholar 

  21. Dix A (2009) Human-computer interaction. In: Encyclopedia of database systems, pp 1327–1331, Springer US

    Google Scholar 

  22. Root RW, Draper S (1983) Questionnaires as a software evaluation tool. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 83–87, ACM

    Google Scholar 

  23. Petukhova V, Bunt H, Malchanau A (2017) Computing negotiation update semantics in multi-issue bargaining dialogues. In: Proceedings of the SemDial 2017 (SaarDial) workshop on the semantics and pragmatics of dialogue, Germany

    Google Scholar 

  24. Malchanau A, Petukhova V, Bunt H, Klakow D (2015) Multidimensional dialogue management for tutoring systems. In: Proceedings of the 7th language and technology conference (LTC 2015), Poznan, Poland

    Google Scholar 

  25. Malchanau A, Petukhova V, Bunt H (2018) Towards integration of cognitive models in dialogue management: designing the virtual negotiation coach application. Dialogue Discourse 9(2):35–79

    Google Scholar 

  26. Lapina V, Petukhova V (2017) Classification of modal meaning in negotiation dialogues. In: Proceedings of the 13th joint ACL-ISO workshop on interoperable semantic annotation (ISA-13), pp 59–70, Montpellier, France

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Volha Petukhova .

Editor information

Editors and Affiliations

Appendix

Appendix

Usability Perception Questionnaire: Multimodal Dialogue System

 

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Malchanau, A., Petukhova, V., Bunt, H. (2019). Multimodal Dialogue System Evaluation: A Case Study Applying Usability Standards. In: D'Haro, L., Banchs, R., Li, H. (eds) 9th International Workshop on Spoken Dialogue System Technology. Lecture Notes in Electrical Engineering, vol 579. Springer, Singapore. https://doi.org/10.1007/978-981-13-9443-0_13

Download citation

Publish with us

Policies and ethics