Abstract
The objective of the current on-going research is to automatically identify the role played by a speaker in a dialogue, and to explore potential conditions that might impose higher speaker’s role identification. We use an interactive Map Task setup with two potential roles: followers and leaders, where each speaker participated twice thus acting in both roles with the same interlocutor. The paper aims to identify speaker’s role, and to explore potential influence of the gender of the speaker, the gender of the interlocutor, and the order of the roles played by the speaker. By using deep learning procedures over a set of acoustic features, we automatically trace the footprints of the role through the speech signal. Results show an average of 73.3% role’s classification rate. We further show that there is a significant difference in the role’s classification rates, depending on the interlocutor’s gender. On average, when the interlocutor is a male, the speaker tends to identify with his or her role more clearly – 77.5% versus 69.9% when the interlocutor is a woman.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Giles, H., Coupland, J., Coupland, N.: Accommodation theory: Communication, context, and consequence. In: Giles, H., Coupland, J., Coupland, N. (eds.) Contexts of Accommodation: Developments in Applied Sociolinguistics, Chap. 1, pp. 1–69. Cambridge University Press, Cambridge (1991)
Gallois, C., Giles, H.: Communication accommodation theory. The international encyclopedia of language and social interaction (2015)
Hirschberg, J.: Communication and prosody: functional aspects of prosody. Speech Commun. 36(1), 31–43 (2002)
Ancona, D., Chong, C.L.: Entrainment: pace, cycle, and rhythm in organizational behavior. In: Staw, B.M., Cummings, L.L. (eds.) Research in Organizational Behavior: An Annual Series of Analytical Essays and Critical Reviews, vol. 18, pp. 251–284. Elsevier Science/JAI Press (1996)
Chartrand, T.L., Bargh, J.A.: The chameleon effect: the perception-behavior link and social interaction. J. Pers. Soc. Psychol. 76(6), 893–910 (1999)
Shepard, C.A.: Communication accommodation theory. The New Handbook of Language and Social Psychology, pp. 33–56 (2001)
Lee, C.C., et al.: Computing vocal entrainment: a signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions. Comput. Speech Lang. 28(2), 518–539 (2014)
Lerner, A., Silber-Varod, V., Batista, F., Moniz, H.: In search of the role’s footprints in client-therapist dialogues. In: Proceedings of Speech Prosody 2016 (SP 2016), Boston, USA (2016)
Koulouri, T., Lauria, S., Macredie, R.D.: The influence of visual feedback and gender dynamics on performance, perception, and communication strategies in CSCW. Int. J. Hum. Comput. Stud. 97, 162–181 (2017)
Broner, M.A.: Impact of interlocutor and task on first and second language use in a Spanish immersion program. Unpublished doctoral dissertation, University of Minnesota, Minneapolis (2000)
Kim, Y., McDonough, K.: The effect of interlocutor proficiency on the collaborative dialogue between Korean as a second language learners. Lang. Teach. Res. 12(2), 211–234 (2008)
Davis, L.: The influence of interlocutor proficiency in a paired oral assessment. Lang. Test. 26(3), 367–396 (2009)
Hori, C., Hori, T., Watanabe, S., Hershey, J.R.: Context-sensitive and role-dependent spoken language understanding using bidirectional and attention LSTMs. In: Morgan, N. (ed.) INTERSPEECH 2016, pp. 3236–3240. ISCA, San Francisco (2016). https://doi.org/10.21437/interspeech.2016
Ma, W., Zhang, M., Liu, Y., Ma, S. Multi-grained role labeling based on multi-modality information for real customer service telephone conversation. In: Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1816–1822. AAAI Press, New York (2016)
Chen, P.C., Chi, T.C., Su, S.Y., Chen, Y.N.: Dynamic time-aware attention to speaker roles and contexts for spoken language understanding. arXiv preprint arXiv:1710.00165 (2017)
Chi, T.C., Chen, P.C., Su, S.Y., Chen, Y.N.: Speaker role contextual modeling for language understanding and dialogue policy learning. arXiv preprint arXiv:1710.00164 (2017)
Li, Y., et al.: Unsupervised classification of speaker roles in multi-participant conversational speech. Comput. Speech Lang. 42, 81–99 (2017)
Barzilay, R., Collins, M., Hirschberg, J., Whittaker, S.: The rules behind roles: identifying speaker role in radio broadcasts. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000), Austin, Texas, pp. 679–684 (2000)
Liu, Y.: Initial study on automatic identification of speaker role in broadcast news speech. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, Association for Computational Linguistics, pp. 81–84 (2006)
Weizman, E.: Positioning in Media Dialogue: Negotiating Roles in the News Interview, vol. 3. John Benjamins Publishing, Amsterdam (2008)
Zhang, B., Hutchinson, B., Wu, W., Ostendorf, M.: Extracting phrase patterns with minimum redundancy for unsupervised speaker role classification. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 717–720 (2010)
Luan, Y., Ji, Y., Ostendorf, M.: LSTM based Conversation Models. arXiv preprint arXiv:1603.09457 (2016)
Silber-Varod, V., Lerner, A., Jokisch, O.: Automatic speaker’s role classification with a bottom-up acoustic feature selection. In: Proceedings GLU 2017 International Workshop on Grounding Language Understanding, Stockholm, Sweden, pp. 52–56 (2017). https://doi.org/10.21437/glu.2017-11
Eyben, F., Wöllmer, M. Schuller, B.: OpenSMILE: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462 (2010). https://doi.org/10.1145/1873951.1874246
Hall, M., Witten, I., Frank, E.: Data mining: practical machine learning tools and techniques, 3rd edn. Kaufmann, Burlington (2011)
MaTaCOp homepage, The Open University of Israel Map Task Corpus (MaTaCOp), http://www.openu.ac.il/en/academicstudies/matacop/. Accessed 30 Apr 2018
Anderson, H., et al.: The HCRC Map Task Corpus. Lang. Speech 34(4,) 351–366 (1991)
Carletta, J., Isard, A., Kowtko, J., Doherty-Sneddon, G.: HCRC dialogue structure coding manual. Human Communication Research Centre (1996)
Ochs, E.: Planned and unplanned discourse. In: Givon, T. (ed.) Syntax and Semantics: Discourse and Syntax, vol. 12. Academic Press, New York (1979)
ZOOM. https://www.zoom-na.com/products/field-video-recording/field-recording/zoom-h4n-handy-recorder. Accessed 21 Apr 2018
McFee, B., et al.: librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference (SciPy 2015), Austin, Texas, pp. 18–25 (2015)
Tavarez, D., et al.: Exploring fusion methods and feature space for the classification of paralinguistic information. In: INTERSPEECH 2017, Stockholm, Sweden, pp. 3517–3521 (2017)
Grus, J.: Data Science from Scratch: First Principles with Python. O’Reilly Media Inc., Sebastopol (2015). ISBN 978-1-491-90142-7
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR, vol. abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Busso, C., Metallinou, A., Narayanan, S.S.: Iterative feature normalization for emotional speech detection. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5692–5695. IEEE (2011)
Acknowledgments
This work was supported by the Open Media and Information Lab (OMILab) at The Open University of Israel [Grant Number 20184] and by research grant #507761 from the Research Authority at The Open University of Israel.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Lerner, A., Miara, O., Malayev, S., Silber-Varod, V. (2018). The Influence of the Interlocutor’s Gender on the Speaker’s Role Identification. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)