Investigating Effectiveness of Linguistic Features Based on Speech Recognition for Storytelling Skill Assessment

  • Shogo OkadaEmail author
  • Kazunori Komatani
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10868)


This paper investigates the effectiveness of linguistic features based on speech recognition for storytelling skill assessment in group conversations. A multimodal data corpus, including the skill scores of storytellers, is used for this study. Three kinds of automatic speech recognition (ASR) results are compared from the viewpoint of the contribution to the skill assessment task. A regression model to predict the skill is trained by fusing the linguistic features and nonverbal features including utterance length, prosody, gaze, head and hand gestures. Experimental results show that the mean regression accuracy \((R^2 = 0.24)\) for the storytelling skills with the linguistic features based on ASR rate 49% is improved from \(R^2 = 0.17\) of the non-verbal model by 0.07 points. We summarize that the features extracted from text contribute to the skill assessment task although the ASR results contained not a few errors.


Storytelling skill assessment Multimodal interaction Automatic linguistic analysis 



This work was performed under the Research Program of “Dynamic Alliance for Open Innovation Bridging Human, Environment and Materials” in “Network Joint Research Center for Materials and Devices” and Japan Society for the Promotion of Science (JSPS) KAKENHI (15K00300, 15H02746).


  1. 1.
    Park, S., Shim, H.S., Chatterjee, M., Sagae, K., Morency, L.P.: Computational analysis of persuasiveness in social multimedia: a novel dataset and multimodal prediction approach. In: Proceedings of ACM ICMI, pp. 50–57 (2014)Google Scholar
  2. 2.
    Okada, S., Ohtake, Y., Nakano, Y.I., Hayashi, Y., Huang, H.H., Takase, Y., Nitta, K.: Estimating communication skills using dialogue acts and nonverbal features in multiple discussion datasets. In: Proceedings of ACM ICMI, pp. 169–176 (2016)Google Scholar
  3. 3.
    Okada, S., Bono, M., Takanashi, K., Sumi, Y., Nitta, K.: Context-based conversational hand gesture classification in narrative interaction. In: Proceedings of ACM ICMI, pp. 303–310 (2013)Google Scholar
  4. 4.
    Okada, S., Hang, M., Nitta, K.: Predicting performance of collaborative storytelling using multimodal analysis. IEICE Trans. 99-D(6), 1462–1473 (2016)Google Scholar
  5. 5.
    Ramanarayanan, V., Leong, C.W., Chen, L., Feng, G., Suendermann-Oeft, D.: Evaluating speech, face, emotion and body movement time-series features for automated multimodal presentation scoring. In: Proceedings of ACM ICMI, pp. 23–30 (2015)Google Scholar
  6. 6.
    Chollet, M., Prendinger, H., Scherer, S.: Native vs. non-native language fluency implications on multimodal interaction for interpersonal skills training. In: Proceedings of ACM ICMI, pp. 386–393 (2016)Google Scholar
  7. 7.
    Nguyen, L.S., Frauendorfer, D., Mast, M.S., Gatica-Perez, D.: Hire me: computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Trans. Multimedia 16(4), 1018–1031 (2014)CrossRefGoogle Scholar
  8. 8.
    Sanchez-Cortes, D., Aran, O., Mast, M.S., Gatica-Perez, D.: A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans. Multimedia 14(3), 816–832 (2012)CrossRefGoogle Scholar
  9. 9.
    Rasipuram, S.B., Jayagopi, D.B.: Asynchronous video interviews vs. face-to-face interviews for communication skill measurement: a systematic study. In: Proceedings of ACM ICMI, pp. 370–377 (2016)Google Scholar
  10. 10.
    Chatterjee, M., Park, S., Morency, L.P., Scherer, S.: Combining two perspectives on classifying multimodal data for recognizing speaker traits. In: Proceedings of ACM ICMI, pp. 7–14 (2015)Google Scholar
  11. 11.
    Nojavanasghari, B., Gopinath, D., Koushik, J., Baltrušaitis, T., Morency, L.P.: Deep multimodal fusion for persuasiveness prediction. In: Proceedings of ACM ICMI, pp. 284–288 (2016)Google Scholar
  12. 12.
    Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Proceedings of APSIPA ASC, pp. 131–137 (2009)Google Scholar
  13. 13.
    Furui, S., Maekawa, K., Isahara, H.: A Japanese national project on spontaneous speech corpus and processing technology. In: Proceedings of ISCA Workshop on Automatic Speech Recognition, pp. 244–248 (2000)Google Scholar
  14. 14.
    Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to japanese morphological analysis. Proc. EMNLP 4, 230–237 (2004)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Japan Advanced Institute of Science and TechnologyNomiJapan
  2. 2.Osaka UniversitySuitaJapan

Personalised recommendations