Turns Analysis for Automatic Role Recognition

  • Sarah Favre
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8045)


This article presents approaches for recognizing automatically the roles people play in a wide range of interaction settings. The proposed role recognition approach includes two main steps. The first step aims at representing the individuals involved in an interaction with feature vectors accounting for their relationships with others. This step includes three main stages, namely segmentation of audio into turns (i.e. time intervals during which only one person talks), conversion of the sequence of turns into a social network, and use of the social network as a tool to extract features for each person. The second step uses machine learning methods to map the feature vectors into roles. The experiments have been carried out over roughly 90 hours of material. This is not only one of the largest databases ever used in literature on role recognition, but also the only one, to the best of our knowledge, including different interaction settings. In the experiments, the accuracy of the percentage of data correctly labeled in terms of roles is roughly 80% in production environments and 70% in spontaneous exchanges (lexical features have been added in the latter case).


Automatic Speech Recognition Lexical Feature Interaction Setting Lexical Choice Market Expert 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ajmera, J., Wooters, C.: A robust speaker clustering algorithm. In: Proceedings of IEEE Workshop on Automatic Speech Recognition Understanding, pp. 411–416 (2003)Google Scholar
  2. 2.
    Barzilay, R., Collins, M., Hirschberg, J., Whittaker, S.: The rules behind the roles: identifying speaker roles in radio broadcasts. In: Proceedings of the 17th National Conference on Artificial Intelligence, pp. 679–684 (2000)Google Scholar
  3. 3.
    Biddle, B.J.: Recent developments in role theory. Annual Review of Sociology 12, 67–92 (1986)CrossRefGoogle Scholar
  4. 4.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  5. 5.
    Dines, J., Vepa, J., Hain, T.: The segmentation of multi-channel meeting recordings for automatic speech recognition. In: Proceedings of Interspeech, pp. 1213–1216 (2006)Google Scholar
  6. 6.
    Favre, S., Dielmann, A., Vinciarelli, A.: Automatic role recognition in multiparty recordings using social networks and probabilistic sequential models. In: Proceedings of ACM International Conference on Multimedia, pp. 585–588 (2009)Google Scholar
  7. 7.
    Garg, N., Favre, S., Salamin, H., Hakkani-Tür, D., Vinciarelli, A.: Role recognition for meeting participants: an approach based on lexical information and Social Network Analysis. In: Proceedings of the ACM International Conference on Multimedia, pp. 693–696 (2008)Google Scholar
  8. 8.
    Laskowski, K., Ostendorf, M., Schultz, T.: Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In: Proceedings of the 9th ISCA/ACL SIGdial Workshop on Discourse and Dialogue, pp. 148–155 (June 2008)Google Scholar
  9. 9.
    Liu, Y.: Initial study on automatic identification of speaker role in broadcast news speech. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 81–84 (June 2006)Google Scholar
  10. 10.
    Massey Jr., F.J.: The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 68–78 (1951)Google Scholar
  11. 11.
    McCowan, I., Carletta, J., Kraaij, W., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus. In: Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, p. 4 (2005)Google Scholar
  12. 12.
    Pianesi, F., Zancanaro, M., Lepri, B., Cappelletti, A.: A multimodal annotated corpus of consensus decision making meetings. Language Resources and Evaluation 41(3-4), 409–429 (2008)CrossRefGoogle Scholar
  13. 13.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. of the IEEE 77, 257–286 (1989)CrossRefGoogle Scholar
  14. 14.
    Rosenfeld, R.: Two decades of statistical language modeling: Where do we go from here. Proceedings of the IEEE 88, 1270–1278 (2000)CrossRefGoogle Scholar
  15. 15.
    Salamin, H., Favre, S., Vinciarelli, A.: Automatic role recognition in multiparty recordings: Using social affiliation networks for feature automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia 27(12), 1373–1380 (2009)CrossRefGoogle Scholar
  16. 16.
    Schapire, R.E., Singer, Y.: Boostexter: a boosting-based system for text categorization, vol. 39, pp. 135–168 (2000)Google Scholar
  17. 17.
    Tischler, H.L.: Introduction to Sociology. Harcourt Brace College Publishers (1990)Google Scholar
  18. 18.
    Vinciarelli, A.: Speakers role recognition in multiparty audio recordings using Social Network Analysis and duration distribution modeling. IEEE Transactions on Multimedia 9(6), 1215–1226 (2007)CrossRefGoogle Scholar
  19. 19.
    Wasserman, S., Faust, K.: Social Network Analysis. Cambridge University Press (1994)Google Scholar
  20. 20.
    Weng, C.Y., Chu, W.T., Wu, J.L.: Rolenet: Movie analysis from the perspective of social networks. IEEE Transactions on Multimedia 11(2), 256–271 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Sarah Favre
    • 1
    • 2
  1. 1.Idiap Research InstituteMartignySwitzerland
  2. 2.Ecole Polytechnique Federale de LausanneLausanneSwitzerland

Personalised recommendations