Abstract
Presentation skills assessment is one of the central challenges of multimodal modeling. Presentation skills are composed of verbal and nonverbal skill components, but because people demonstrate their presentation skills in a variety of manners, the observed multimodal features vary widely. Due to the differences in features, when test data samples are generated on different training data sample distributions, in many cases, the prediction accuracy of the skills degrades. In machine learning theory, this problem in which training (source) data are biased is known as instance selection bias or covariate shift. To solve this problem, this paper presents an instance weighting adaptation method that is applied to estimate the presentation skills of each participant from multimodal (verbal and nonverbal) features. For this purpose, we collect a novel multimodal presentation dataset that includes audio signal data, body motion sensor data, and text data of the speech content for participants observed in 58 presentation sessions. The dataset also includes both verbal and nonverbal presentation skills, which are assessed by two external experts from a human resources department. We extract multimodal features, such as spoken utterances, acoustic features, and the amount of body motion, to estimate the presentation skills. We propose two approaches, early fusing and late fusing, for the regression models based on multimodal instance weighting adaptation. The experimental results show that the early fusing regression model with instance weighting adaptation achieved \(\rho =0.39\) for the Pearson correlation, which presents the regression accuracy for the clarity of presentation goal elements. In the maximum case, the accuracy (correlation coefficient) is improved from \(-0.34\) to +0.35 by instance weighting adaptation.
Similar content being viewed by others
Notes
The spoken content in the presentations include private information related to the company and the presenter, so the data set is not available to the public due to privacy policies.
The lecturers provide feedback comments, including the good points in the presentation or points to be improved, to the attendees after the program.
References
Aran O, Gatica-Perez D (2013) One of a kind: inferring personality impressions in meetings. In: Proceedings of ACM ICMI, pp 11–18
Baltruŝaitis T, Mahmoud M, Robinson P (2015) Cross-dataset learning and person-specific normalisation for automatic action unit detection. In: Proceedings of FG workshops
Batrinca L, Mana N, Lepri B, Sebe N, Pianesi F (2016) Multimodal personality recognition in collaborative goal-oriented tasks. IEEE Trans Multimedia 18(4):659–673
Berger CR (2003) Chapter 7 “Message Production Skill in Social Interaction”. In: Handbook of communication and social interaction skills. Psychology Press
Biel JI, Teijeiro-Mosquera L, Gatica-Perez D (2012) Facetube: predicting personality from facial expressions of emotion in online conversational video. In: Proceedings of ACM ICMI
Chen L, Feng G, Joe J, Leong CW, Kitchen C, Lee CM (2014) Towards automated assessment of public speaking skills using multimodal cues. In: Proceedings of ACM ICMI
Chollet M, Massachi T, Scherer S (2017) Racing heart and sweaty palms. In: Beskow J, Peters C, Castellano G, O’Sullivan C, Leite I, Kopp S (eds) Intelligent virtual agents. Springer International Publishing
Chollet M, Prendinger H, Scherer S (2016) Native versus non-native language fluency implications on multimodal interaction for interpersonal skills training. In: Proceedings of ACM ICMI
Chollet M, Scherer S (2017) Assessing public speaking ability from thin slices of behavior. In: Proceedings of IEEE FG
Chollet M, Stefanov K, Prendinger H, Scherer S (2015) Public speaking training with a multimodal interactive virtual audience framework. In: Proceedings of ACM ICMI
Chollet M, Wörtwein T, Morency LP, Shapiro A, Scherer S (2015) Exploring feedback strategies to improve public speaking: An interactive virtual audience framework. In: Proceedings of ACM UbiComp
Greene JO, Burleson BR (2003) Handbook of communication and social interaction skills. Psychology Press
Hall JA (1984) Nonverbal sex differences? Communication accuracy and expressive style. Johns Hopkins University Press
Hoque ME, Courgeon M, Martin JC, Mutlu B, Picard RW (2013) Mach: my automated conversation coach. In: Proceedings of ACM UbiComp. ACM, pp 697–706
Härdle W, Müller M, Sperlich S, Werwatz A (2004) Nonparametric and semiparametric models
Ishii R, Otsuka K, Kumano S, Higashinaka R, Tomita J (2018) Analyzing gaze behavior and dialogue act during turn-taking for estimating empathy skill level. In: Proceedings of ACM ICMI
Jayagopi DB, Sanchez-Cortes D, Otsuka K, Yamato J, Gatica-Perez D (2012) Linking speaking and looking behavior patterns with group composition, perception, and performance. In: Proceedings of ACM ICMI
Kanamori T, Hido S, Sugiyama M (2009) A least-squares approach to direct importance estimation. J Mach Learn Res 10:1391–1445
Kanamori T, Suzuki T, Sugiyama M (2012) Statistical analysis of kernel-based least-squares density-ratio estimation. Mach Learn 86(3):335–367
Kudo T, Yamamoto K, Matsumoto Y (2004) Applying conditional random fields to Japanese morphological analysis. In: Proceedings of EMNLP
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of ICML
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Li Y, Kambara H, Koike Y, Sugiyama M (2010) Application of covariate shift adaptation techniques in brain-computer interfaces. IEEE Trans Biomed Eng 57(6):1318–1324
Lin YS, Lee CC (2018) Using interlocutor-modulated attention blstm to predict personality traits in small group interaction. In: Proceedings of ACM ICMI
Lombard M, Snyder-Duch J, Bracken C (2005) Practical resources for assessing and reporting intercoder reliability in content analysis research projects. Retrieved April 19
Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space
Nguyen L, Frauendorfer D, Mast M, Gatica-Perez D (2014) Hire me: computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Trans Multimedia
Okada S, Komatani K (2018) Investigating effectiveness of linguistic features based on speech recognition for storytelling skill assessment. In: Recent trends and future technology in applied intelligence. Springer International Publishing, pp 148–157
Okada S, Ohtake Y, Nakano YI, Hayashi Y, Huang HH, Takase Y, Nitta K (2016) Estimating communication skills using dialogue acts and nonverbal features in multiple discussion datasets. In: Proceedings of ACM ICMI
Park S, Shim HS, Chatterjee M, Sagae K, Morency LP (2014) Computational analysis of persuasiveness in social multimedia: A novel dataset and multimodal prediction approach. In: Proceedings of ACM ICMI
Pérez-Rosas V, Mihalcea R, Morency LP (2013) Utterance-level multimodal sentiment analysis. In: Proceedings of ACL
Pianesi F, Mana N, Cappelletti A, Lepri B, Zancanaro M (2008) Multimodal recognition of personality traits in social interactions. In: Proceedings of ACM ICMI
Ramanarayanan V, Leong CW, Chen L, Feng G, Suendermann-Oeft D (2015) Evaluating speech, face, emotion and body movement time-series features for automated multimodal presentation scoring. In: Proceedings of ACM ICMI
Rosenberg A, Hirschberg J (2005) Acoustic/prosodic and lexical correlates of charismatic speech. In: Proceedings of INTERSPEECH
Sanchez-Cortes D, Aran O, Mast MS, Gatica-Perez D (2012) A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans Multimedia 14
Scherer S, Weibel N, Morency LP, Oviatt S (2012) Multimodal prediction of expertise and leadership in learning groups. In: Proceedings of the international workshop on MLA
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244
Sugiyama M, Kawanabe M (2012) Machine learning in non-stationary environments: introduction to covariate shift adaptation. The MIT Press
Sugiyama M, Nakajima S, Kashima H, Buenau PV, Kawanabe M (2008) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Proceedings of advances in neural information processing systems
Tanaka H, Negoro H, Iwasaka H, Nakamura S (2018) Listening skills assessment through computer agents. In: Proceedings of ACM ICMI
Tanaka H, Sakti S, Neubig G, Toda T, Negoro H, Iwasaka H, Nakamura S (2015) Automated social skills trainer. In: Proceedings of ACM IUI
Tsuboi Y, Kashima H, Hido S, Bickel S, Sugiyama M (2009) Direct density ratio estimation for large-scale covariate shift adaptation. J Inf Process 17:138–155
Valente F, Kim S, Motlicek P (2012) Annotation and recognition of personality traits in spoken conversations from the ami meetings corpus. In: Proceedings of INTERSPEECH
Wood E, Baltruaitis T, Zhang X, Sugano Y, Robinson P, Bulling A (2015) Rendering of eyes for eye-shape registration and gaze estimation. In: Proceedings of IEEE ICCV
Wörtwein T, Chollet M, Schauerte B, Morency LP, Stiefelhagen R, Scherer S (2015) Multimodal public speaking performance assessment. In: Proceedings of ACM ICMI
Wörtwein T, Morency L, Scherer S (2015) Automatic assessment and analysis of public speaking anxiety: a virtual audience case study. In: Proceedings of ACII
Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. In: Proceedings of ICML
Acknowledgements
We appreciate the cooperation of the human resource development department of Softbank Corp. This work was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers 19H01120, 19H01719 and JST AIP Trilateral AI Research, Grant Number JPMJCR20G6, Japan.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yutaro Yagi and Shogo Okada equally contributed.
Rights and permissions
About this article
Cite this article
Yagi, Y., Okada, S., Shiobara, S. et al. Predicting multimodal presentation skills based on instance weighting domain adaptation. J Multimodal User Interfaces 16, 1–16 (2022). https://doi.org/10.1007/s12193-021-00367-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-021-00367-x