Human vs. Computer Performance in Voice-Based Recognition of Interpersonal Stance

  • Daniel Formolo
  • Tibor BosseEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10271)


This paper presents an algorithm to automatically detect interpersonal stance in vocal signals. The focus is on two stances (referred to as ‘Dominant’ and ‘Empathic’) that play a crucial role in aggression de-escalation. To develop the algorithm, first a database was created with more than 1000 samples from 8 speakers from different countries. In addition to creating the algorithm, a detailed analysis of the samples was performed, in an attempt to relate interpersonal stance to emotional state. Finally, by means of an experiment via Mechanical Turk, the performance of the algorithm was compared with the performance of human beings. The resulting algorithm provides a useful basis to develop computer-based support for interpersonal skills training.


Emotion recognition Voice Interpersonal stance Experiments 



This research was supported by the Brazilian scholarship program Science without Borders - CNPq {scholarship reference: 233883/2014-2}.


  1. 1.
    Hogan, K., Stubbs, R.: Can’t Get Through: 8 Barriers to Communication. Pelican Publishing Company, Gretna (2003)Google Scholar
  2. 2.
    Patterson, A.E., Berg, M.: Exploring nonverbal communication through service learning. J. Civic Commitment, 21 (2014)Google Scholar
  3. 3.
    Picard, RW.: Affective computing for HCI. In: Proceedings of HCI 1999, Munich, Germany (1999)Google Scholar
  4. 4.
    Bosse, T., Gerritsen, C., de Man, J.: An intelligent system for aggression de-escalation training. In: Proceedings of the 22nd European Conference on Artificial Intelligence, ECAI 2016. IOS Press (2016)Google Scholar
  5. 5.
    Anderson, L.N., Clarke, J.T.: De-escalating verbal aggression in primary care settings. Nurse Pract. 21(10), 95, 98, 101, 102 (1996)Google Scholar
  6. 6.
    Du Bois, J.W.: The stance triangle. In: Englebretson, R. (ed.) Stancetaking in Discourse: Subjectivity, Evaluation, Interaction, pp. 139–182. John Benjamins Publishing Company (2007)Google Scholar
  7. 7.
    Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In: Proceedings of ACM Multimedia (MM), Barcelona, Spain, pp. 835-838. ACM, October 2013. ISBN 978-1-4503-2404-5, doi: 10.1145/2502081.2502224
  8. 8.
    Dodge, K.A.: The structure and function of reactive and proactive aggression. In: Pepler, D., Rubin, H. (eds.) The development and treatment of childhood aggression, pp. 201–218. Erlbaum, Hillsdale (1990)Google Scholar
  9. 9.
    Berkowitz, L.: Whatever happened to the frustration-aggression hypothesis? Am. Behav. Sci. 21, 691–708 (1978)Google Scholar
  10. 10.
    Bosse, T., Provoost, S.: Towards aggression de-escalation training with virtual agents: a computational model. In: Zaphiris, P., Ioannou, A. (eds.) LCT 2014. LNCS, vol. 8524, pp. 375–387. Springer, Cham (2014). doi: 10.1007/978-3-319-07485-6_37 Google Scholar
  11. 11.
    Leary, T.: Interpersonal Diagnosis of Personality: Functional Theory and Methodology for Personality Evaluation. Ronald Press, New York (1957)Google Scholar
  12. 12.
    Russel, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39, 1161–1178 (1980)CrossRefGoogle Scholar
  13. 13.
    Hirst, D., Di Cristi, A.: A survey of intonation systems. In: Hirst, D., Di Cristo, A. (eds.) Intonation Systems: A Survey of Twenty Languages, pp. 1–44. Cambridge University Press, Cambridge (1998)Google Scholar
  14. 14.
    Mark Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)Google Scholar
  15. 15.
    Hsu, C., Chang, C., Lin, C.: A practical guide to support vector classification (2010)Google Scholar
  16. 16.
    Rybka, J., Janicki, A.: Comparison of speaker dependent and speaker independent emotion recognition. Int. J. Appl. Math. Comput. Sci. 23(4), 797–808 (2013). doi: 10.2478/amcs-2013-0060
  17. 17.
    ElAyadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. J. 44, 572–587 (2011). doi: 10.1016/j.patcog.2010.09.020 CrossRefzbMATHGoogle Scholar
  18. 18.
    Lin, C., Liao, W., Hsieh, W., Liao, W., Wang, J.: Emotion identification using extremely low frequency components of speech feature contours. Sci. World J. 2014 (2014). Article id. 757121, Hindawi Publishing Corporation. doi: 10.1155/2014/757121
  19. 19.
    Yik, M., Russel, J., Steiger, J.: A 12-point circumplex structure of core affect. Emotion 11(4), 705–731 (2011)CrossRefGoogle Scholar
  20. 20.
    Formolo, D., Bosse, T.: Human vs. Computer performance in voice-based emotion recognition. In: Proceedings of the 19th International Conference on Human-Computer Interaction, HCI 2017. LNCS, pp 285–291. Springer, Heidelberg (2017)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceVrije Universiteit AmsterdamAmsterdamThe Netherlands

Personalised recommendations