Abstract
This article presents approaches for recognizing automatically the roles people play in a wide range of interaction settings. The proposed role recognition approach includes two main steps. The first step aims at representing the individuals involved in an interaction with feature vectors accounting for their relationships with others. This step includes three main stages, namely segmentation of audio into turns (i.e. time intervals during which only one person talks), conversion of the sequence of turns into a social network, and use of the social network as a tool to extract features for each person. The second step uses machine learning methods to map the feature vectors into roles. The experiments have been carried out over roughly 90 hours of material. This is not only one of the largest databases ever used in literature on role recognition, but also the only one, to the best of our knowledge, including different interaction settings. In the experiments, the accuracy of the percentage of data correctly labeled in terms of roles is roughly 80% in production environments and 70% in spontaneous exchanges (lexical features have been added in the latter case).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: Proceedings of IEEE Workshop on Automatic Speech Recognition Understanding, pp. 411–416 (2003)
Barzilay, R., Collins, M., Hirschberg, J., Whittaker, S.: The rules behind the roles: identifying speaker roles in radio broadcasts. In: Proceedings of the 17th National Conference on Artificial Intelligence, pp. 679–684 (2000)
Biddle, B.J.: Recent developments in role theory. Annual Review of Sociology 12, 67–92 (1986)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
Dines, J., Vepa, J., Hain, T.: The segmentation of multi-channel meeting recordings for automatic speech recognition. In: Proceedings of Interspeech, pp. 1213–1216 (2006)
Favre, S., Dielmann, A., Vinciarelli, A.: Automatic role recognition in multiparty recordings using social networks and probabilistic sequential models. In: Proceedings of ACM International Conference on Multimedia, pp. 585–588 (2009)
Garg, N., Favre, S., Salamin, H., Hakkani-Tür, D., Vinciarelli, A.: Role recognition for meeting participants: an approach based on lexical information and Social Network Analysis. In: Proceedings of the ACM International Conference on Multimedia, pp. 693–696 (2008)
Laskowski, K., Ostendorf, M., Schultz, T.: Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In: Proceedings of the 9th ISCA/ACL SIGdial Workshop on Discourse and Dialogue, pp. 148–155 (June 2008)
Liu, Y.: Initial study on automatic identification of speaker role in broadcast news speech. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 81–84 (June 2006)
Massey Jr., F.J.: The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 68–78 (1951)
McCowan, I., Carletta, J., Kraaij, W., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus. In: Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, p. 4 (2005)
Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of consensus decision making meetings. Language Resources and Evaluation 41(3-4), 409–429 (2008)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. of the IEEE 77, 257–286 (1989)
Rosenfeld, R.: Two decades of statistical language modeling: Where do we go from here. Proceedings of the IEEE 88, 1270–1278 (2000)
Salamin, H., Favre, S., Vinciarelli, A.: Automatic role recognition in multiparty recordings: Using social affiliation networks for feature automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia 27(12), 1373–1380 (2009)
Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization, vol. 39, pp. 135–168 (2000)
Tischler, H.L.: Introduction to Sociology. Harcourt Brace College Publishers (1990)
Vinciarelli, A.: Speakers role recognition in multiparty audio recordings using Social Network Analysis and duration distribution modeling. IEEE Transactions on Multimedia 9(6), 1215–1226 (2007)
Wasserman, S., Faust, K.: Social Network Analysis. Cambridge University Press (1994)
Weng, C.Y., Chu, W.T., Wu, J.L.: Rolenet: Movie analysis from the perspective of social networks. IEEE Transactions on Multimedia 11(2), 256–271 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Favre, S. (2014). Turns Analysis for Automatic Role Recognition. In: Murray-Smith, R. (eds) Mobile Social Signal Processing. MSSP 2010. Lecture Notes in Computer Science, vol 8045. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54325-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-54325-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54324-1
Online ISBN: 978-3-642-54325-8
eBook Packages: Computer ScienceComputer Science (R0)