Using Audio, Visual, and Lexical Features in a Multi-modal Virtual Meeting Director

Al-Hames, Marc; Hörnler, Benedikt; Scheuermann, Christoph; Rigoll, Gerhard

doi:10.1007/11965152_6

Marc Al-Hames¹⁹,
Benedikt Hörnler¹⁹,
Christoph Scheuermann¹⁹ &
…
Gerhard Rigoll¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

757 Accesses
2 Citations

Abstract

Multi-modal recordings of meetings provide the basis for meeting browsing and for remote meetings. However it is often not useful to store or transmit all visual channels. In this work we show how a virtual meeting director selects one of seven possible video modes. We then present several audio, visual, and lexical features for a virtual director. In an experimental section we evaluate the features, their influence on the camera selection, and the properties of the generated video stream. The chosen features all allow a real- or near real-time processing and can therefore not only be applied to offline browsing, but also for a remote meeting assistant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Al-Hames, M., Dielmann, A., Gatica-Perez, D., Reiter, S., Renals, S., Rigoll, G., Zhang, D.: Multimodal integration for meeting group action segmentation and recognition. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 52–63. Springer, Heidelberg (2006)
Chapter Google Scholar
Al-Hames, M., Hain, T., Cernocky, J., Schreiber, S., Poel, M., Muller, R., Marcel, S., van Leeuwen, D., Odobez, J.M., Ba, S., Bourlard, H., Cardinaux, F., Gatica-Perez, D., Janin, A., Motlicek, P., Reiter, S., Renals, S., van Rest, J., Rienks, R., Rigoll, G., Smith, K., Thean, A., Zemcik, P.: Audio-visual processing in meetings: Seven questions and current AMI answers. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 24–35. Springer, Heidelberg (2006)
Chapter Google Scholar
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meetings corpus. In: Proceedings of the Measuring Behavior 2005 symposium on Annotating and measuring Meeting Behavior (2005)
Google Scholar
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI meeting corpus. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2003)
Google Scholar
Potucek, I., Sumec, S., Spanel, M.: Participant activity detection by hands and face movement tracking in the meeting room. In: Proceedings IEEE Computer Graphics International (CGI), pp. 632–635 (2004)
Google Scholar
Pratt, W.K.: Digital image processing. John Wiley & Sons, Chichester (2001)
Book Google Scholar
Smith, K., Schreiber, S., Beran, V., Potúcek, I., Gatica-Perez, D.: A comparitive study of head tracking methods. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)
Chapter Google Scholar
Waibel, A., Steusloff, H., Stiefelhagen, R., the CHIL Project Consortium: CHIL: Computers in the human interaction loop. In: Proceedings of the NIST ICASSP Meeting Recognition Workshop (2004)
Google Scholar
Wallhoff, F., Zobl, M., Rigoll, G.: Action segmentation and recognition in meeting room scenarios. In: Proceedings IEEE International Conference on Image Processing (ICIP), Singapore (October 2004)
Google Scholar
Wellner, P., Flynn, M., Guillemot, M.: Browsing recorded meetings with Ferret. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361. Springer, Heidelberg (2005)
Chapter Google Scholar
Yang, M.-H., Kriegman, D.J., Ahuja, N.: Detecting faces in images: A survey. IEEE Transasctions on Pattern Analysis and Machine Intelligence 24(1), 34–58 (2002)
Article Google Scholar
Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I., Lathoud, G.: Modeling individual and group actions in meetings: a two-layer hmm framework. In: Proceedings IEEE Workshop on Event Mining at the Conference on Computer Vision and Pattern Recognition (CVPR) (2004)
Google Scholar
Zobl, M., Wallhoff, F., Rigoll, G.: Action recognition in meeting scenarios using global motion features. In: Ferryman, J. (ed.) Proceedings Fourth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS-ICVS), pp. 32–36 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Human-Machine-Communication, Technische Universität München, Arcisstr. 21, 80290, Munich, Germany
Marc Al-Hames, Benedikt Hörnler, Christoph Scheuermann & Gerhard Rigoll

Authors

Marc Al-Hames
View author publications
You can also search for this author in PubMed Google Scholar
Benedikt Hörnler
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Scheuermann
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Rigoll
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al-Hames, M., Hörnler, B., Scheuermann, C., Rigoll, G. (2006). Using Audio, Visual, and Lexical Features in a Multi-modal Virtual Meeting Director. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_6

Download citation

DOI: https://doi.org/10.1007/11965152_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics