Multimodal Dialogue System Evaluation: A Case Study Applying Usability Standards

Malchanau, Andrei; Petukhova, Volha; Bunt, Harry

doi:10.1007/978-981-13-9443-0_13

Andrei Malchanau³⁷,
Volha Petukhova³⁷ &
Harry Bunt³⁸

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 579))

337 Accesses

Abstract

This paper presents an approach to the evaluation of multimodal dialogue systems, applying usability metrics defined in ISO standards. Users’ perceptions of effectiveness, efficiency and satisfaction were correlated with various performance metrics derived from system logfiles and reference annotations. Usability experts rated questions from a preliminary 110-items questionnaire, and an assessment of their agreement on usability concepts has led to a selection of eight main factors: task completion and quality, robustness, learnability, flexibility, likeability, ease of use and usefulness (value) of an application. Based on these factors, an internally consistent and reliable questionnaire with 32 items (Cronbach’s alpha of 0.87) was produced. This questionnaire was used to evaluate the Virtual Negotiation Coaching system for metacognitive skills training in a multi-issue bargaining setting. The observed correlations between usability perception and derived performance metrics suggest that the overall system usability is determined by the quality of agreements reached, by the robustness and flexibility of the interaction, and by the quality of system responses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The usability questionnaire for a multimodal dialogue system evaluation is provided in the Appendix.
2.
Pareto optimality reflects a state of affairs when there is no alternative state that would make any partner better off without making anyone worse off.
3.
Positional bargaining involves holding on to a fixed preferences set regardless of the interests of others.
4.
Examples of resources are: the Wall Street Journal WSJ0 corpus, HUB4 News Broadcast data and the VoxForge corpus.
5.
We consider the overall negotiation task as completed if parties agreed on all four issues or parties came to the conclusion that it is impossible to reach any agreement.
6.
Overall task quality was computed in terms of number of reward points the trainee gets at the end of each negotiation round and summing up over multiple repeated rounds; and Pareto optimality (see footnote 5).
7.
We considered negative deals as flawed negotiation action, i.e. the sum of all reached agreements resulted in an overall negative value, meaning that the trainee made too many concessions and selected mostly dispreferred bright ‘orange’ options (see Fig. 1).
8.
For now, this is only the general observation and the metric will be taken into consideration in future test-retest experiments.
9.
Performance metrics related to initiative and task substituitivity aspects and their impact on the perceived usability will be an issue for future research.
10.
System action is appropriate given the context if it introduces or continues a repair strategy.
11.
System action is considered as correct if it addresses the user’s actions as intended and expected. These actions exclude recovery actions and error handling.

References

Walker MA, Litman DJ, Kamm CA, Abella A (1997) PARADISE: a framework for evaluating spoken dialogue agents. In: Proceedings of the 8th conference on European chapter of the association for computational linguistics, pp 271–280
Google Scholar
López-Cózar R, Callejas Z, McTear M (2006) Testing the performance of spoken dialogue systems by means of an artificially simulated user. Artif Intell Rev 26(4):291-323. Springer (2006)
Google Scholar
Dzikovska M, Moore J, Steinhauser N, Campbell G (2011) Exploring user satisfaction in a tutorial dialogue system. In: Proceedings of the 12th annual meeting of the special interest group on discourse and dialogue (SIGdial 2011), pp 162–172
Google Scholar
Lewis JR (1991) Psychometric evaluation of an after-scenario questionnaire for computer usability studies: the ASQ. ACM Sigchi Bull 23(1):78–81
Article Google Scholar
Brooke J (1996) SUS-A quick and dirty usability scale. Usability Eval Ind 189(194):4–7
Google Scholar
Singh M, Oualil Y, Klakow D (2017) Approximated and domain-adapted LSTM language models for first-pass decoding in speech recognition. In: Proceedings of the 18th annual conference of the international speech communication association (INTERSPEECH), Stockholm, Sweden
Google Scholar
Amanova D, Petukhova V, Klakow D (2016) Creating annotated dialogue resources: cross-domain dialogue act classification. In: Proceedings of the 9th international conference on language resources and evaluation (LREC 2016), ELRA, Paris
Google Scholar
Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 213–218, ACM
Google Scholar
Petukhova V, Stevens CA, de Weerd H, Taatgen N, Cnossen F, Malchanau A (2016) Modelling multi-issue bargaining dialogues: data collection, annotation design and corpus. In: Proceedings of the 9th international conference on language resources and evaluation (LREC 2016), ELRA, Paris
Google Scholar
Nielsen J (2012) User satisfaction vs. performance metrics. Nielsen Norman Group
Google Scholar
Danieli M, Gerbino E (1995) Metrics for evaluating dialogue strategies in a spoken language system. In: Proceedings of the 1995 AAAI spring symposium on empirical methods in discourse interpretation and generation, vol 16, pp 34–39
Google Scholar
Walker M, Kamm C, Litman D (2000) Towards developing general models of usability with PARADISE. Nat Lang Eng 6(3–4):363–377
Article Google Scholar
Hone KS, Graham R (2001) Subjective assessment of speech-system interface usability. In: Proceedings of the 7th European conference on speech communication and technology
Google Scholar
Sauro J, Dumas JS (2009) Comparison of three one-question, post-task usability questionnaires. In: Proceedings of the SIGCHI conference on human factors in computing system, ACM, pp 1599–1608
Google Scholar
Hart SG, Staveland LE (1988) Development of NASA-TLX (task load index): results of empirical and theoretical research. Adv Psychol 52, 139–183 (Elsevier)
Google Scholar
Fraser N (1997) Assessment of interactive systems. In: Gibbon D, Moore R, Winski R (eds) Handbook on standards and resources for spoken language systems. Mouton de Gruyter, Berlin, pp 564–615
Google Scholar
Möller S (2004) Quality of telephone-based spoken dialogue systems. Springer Science & Business Media
Google Scholar
Bartneck C, Kulić D, Croft E, Zoghbi S (2009) Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc Robot 1(1):71–81
Article Google Scholar
Linek SB, Marte B, Albert D (2008) The differential use and effective combination of questionnaires and logfiles. In: Computer-based knowledge and skill assessment and feedback in learning settings (CAF), Proceedings of the ICL
Google Scholar
Kooijmans T, Kanda T, Bartneck C, Ishiguro H, Hagita N (2007) Accelerating robot development through integral analysis of human robot interaction. IEEE Trans Robot 23(5):1001–1012
Article Google Scholar
Dix A (2009) Human-computer interaction. In: Encyclopedia of database systems, pp 1327–1331, Springer US
Google Scholar
Root RW, Draper S (1983) Questionnaires as a software evaluation tool. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 83–87, ACM
Google Scholar
Petukhova V, Bunt H, Malchanau A (2017) Computing negotiation update semantics in multi-issue bargaining dialogues. In: Proceedings of the SemDial 2017 (SaarDial) workshop on the semantics and pragmatics of dialogue, Germany
Google Scholar
Malchanau A, Petukhova V, Bunt H, Klakow D (2015) Multidimensional dialogue management for tutoring systems. In: Proceedings of the 7th language and technology conference (LTC 2015), Poznan, Poland
Google Scholar
Malchanau A, Petukhova V, Bunt H (2018) Towards integration of cognitive models in dialogue management: designing the virtual negotiation coach application. Dialogue Discourse 9(2):35–79
Google Scholar
Lapina V, Petukhova V (2017) Classification of modal meaning in negotiation dialogues. In: Proceedings of the 13th joint ACL-ISO workshop on interoperable semantic annotation (ISA-13), pp 59–70, Montpellier, France
Google Scholar

Download references

Author information

Authors and Affiliations

Spoken Language Systems Group, Saarland University, Saarbrücken, Germany
Andrei Malchanau & Volha Petukhova
Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands
Harry Bunt

Authors

Andrei Malchanau
View author publications
You can also search for this author in PubMed Google Scholar
Volha Petukhova
View author publications
You can also search for this author in PubMed Google Scholar
Harry Bunt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Volha Petukhova .

Editor information

Editors and Affiliations

Universidad Politécnica de Madrid, Madrid, Spain
Luis Fernando D'Haro
Nanyang Technological University, Singapore, Singapore
Rafael E. Banchs
Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
Haizhou Li

Appendix

Usability Perception Questionnaire: Multimodal Dialogue System

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malchanau, A., Petukhova, V., Bunt, H. (2019). Multimodal Dialogue System Evaluation: A Case Study Applying Usability Standards. In: D'Haro, L., Banchs, R., Li, H. (eds) 9th International Workshop on Spoken Dialogue System Technology. Lecture Notes in Electrical Engineering, vol 579. Springer, Singapore. https://doi.org/10.1007/978-981-13-9443-0_13

Download citation

DOI: https://doi.org/10.1007/978-981-13-9443-0_13
Published: 25 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9442-3
Online ISBN: 978-981-13-9443-0
eBook Packages: Literature, Cultural and Media StudiesLiterature, Cultural and Media Studies (R0)

Publish with us

Policies and ethics

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation