Skip to main content

VACE Multimodal Meeting Corpus

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 3869)

Abstract

In this paper, we report on the infrastructure we have developed to support our research on multimodal cues for understanding meetings. With our focus on multimodality, we investigate the interaction among speech, gesture, posture, and gaze in meetings. For this purpose, a high quality multimodal corpus is being produced.

Keywords

  • Meeting Room
  • Multimodal Interface
  • Speech Recognizer
  • Audio Processing
  • Audio Channel

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/11677482_4
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-32550-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burger, S., MacLaren, V., Yu, H.: The ISL meeting corpus: The impact of meeting type on speech type. In: Proc. of Int. Conf. on Spoken Language Processing (ICSLP) (2002)

    Google Scholar 

  2. Morgan, N., et al.: Meetings about meetings: Research at ICSI on speech in multiparty conversations. In: Proc. of ICASSP, Hong Kong, vol. 4, pp. 740–743 (2003)

    Google Scholar 

  3. Garofolo, J., Laprum, C., Michel, M., Stanford, V., Tabassi, E.: The NISTMeeting Room Pilot Corpus. In: Proc. of Language Resource and Evaluation Conference (2004)

    Google Scholar 

  4. McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic analysis of multimodal group actions in meetings. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 305–317 (2005)

    CrossRef  Google Scholar 

  5. Schultz, T., Waibel, A., et al.: The ISL meeting room system. In: Proceedings of the Workshop on Hands-Free Speech Communication, Kyoto, Japan (2001)

    Google Scholar 

  6. Polzin, T.S., Waibel, A.: Detecting emotions in speech. In: Proc. of the CMC (1998)

    Google Scholar 

  7. Stiefelhagen, R.: Tracking focus of attention in meetings. In: Proc. of Int. Conf. on Multimodal Interface (ICMI), Pittsburg, PA (2002)

    Google Scholar 

  8. Alfred, D., Renals, S.: Dynamic bayesian networks for meeting structuring. In: Proc. of ICASSP, Montreal, Que, Canada, vol. 5, pp. 629–632 (2004)

    Google Scholar 

  9. Gatica-Perez, D., Lathoud, G., McCowan, I., Odobez, J., Moore, D.: Audio-visual speaker tracking with importance particle filters. In: Proc. of Int. Conf. on Image Processing (ICIP), Barcelona, Spain, vol. 3, pp. 25–28 (2003)

    Google Scholar 

  10. Renals, S., Ellis, D.: Audio information access from meeting rooms. In: Proc. of ICASSP, Hong Kong, vol. 4, pp. 744–747 (2003)

    Google Scholar 

  11. Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proc. of ICASSP, Montreal, Que, Canada, vol. 1, pp. 605–608 (2004)

    Google Scholar 

  12. Moore, D., McCowan, I.: Microphone array speech recognition: Experiments on overlapping speech in meetings. In: Proc. of ICASSP, Hong Kong, vol. 5, pp. 497–500 (2003)

    Google Scholar 

  13. Han, T.X., Huang, T.S.: Articulated body tracking using dynamic belief propagation. In: Proc. IEEE International Workshop on Human-Computer Interaction (2005)

    Google Scholar 

  14. Tu, J., Huang, T.S.: Online updating appearance generative mixture model for meanshift tracking. In: Proc. of Int. Conf. on Computer Vision (ICCV) (2005)

    Google Scholar 

  15. Tu, J., Tao, H., Forsyth, D., Huang, T.S.: Accurate head pose tracking in low resolution video. In: Proc. of Int. Conf. on Computer Vision (ICCV) (2005)

    Google Scholar 

  16. Quek, F., Bryll, R., Ma, X.F.: A parallel algorighm for dynamic gesture tracking. In: ICCV Workshop on RATFG-RTS, Gorfu,Greece (1999)

    Google Scholar 

  17. Bryll, R.: A Robust Agent-Based Gesture Tracking System. PhD thesis, Wright State University (2004)

    Google Scholar 

  18. Quek, F., Bryll, R., Qiao, Y., Rose, T.: Vector coherence mapping: Motion field extraction by exploiting multiple coherences. CVIU special issue on Spatial Coherence in Visual Motion Analysis (Submitted, 2005)

    Google Scholar 

  19. Strassel, S., Glenn, M.: Shared linguistic resources for human language technology in the meeting domain. In: Proceedings of ICASSP 2004 Meeting Workshop (2004)

    Google Scholar 

  20. Huang, Z., Harper, M.: Speech and non-speech detection in meeting audio for transcription. In: MLMI 2005 NIST RT-05S Workshop (2005)

    Google Scholar 

  21. Bird, S., Liberman, M.: Linguistic Annotation: Survey by LDC, http://www.ldc.upenn.edu/annotation/

  22. Barras, C., Geoffrois, D., Wu, Z., Liberman, W.: Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication (2001)

    Google Scholar 

  23. Boersma, P., Weeninck, D.: Praat, a system for doing phonetics by computer. Technical Report 132, University of Amsterdam, Inst. of Phonetic Sc. (1996)

    Google Scholar 

  24. Chen, L., Liu, Y., Harper, M., Maia, E., McRoy, S.: Evaluating factors impacting the accuracy of forced alignments in a multimodal corpus. In: Proc. of Language Resource and Evaluation Conference, Lisbon, Portugal (2004)

    Google Scholar 

  25. Sundaram, R., Ganapathiraju, A., Hamaker, J., Picone, J.: ISIP 2000 conversational speech evaluation system. In: Speech Transcription Workshop 2001, College Park, Maryland (2000)

    Google Scholar 

  26. Pellom, B.: SONIC: The University of Colorado continuous speech recognizer. Technical Report TR-CSLR-2001-01, University of Colorado (2001)

    Google Scholar 

  27. Quek, F., McNeill, D., Rose, T., Shi, Y.: A coding tool for multimodal analysis of meeting video. In: NIST Meeting Room Workshop (2003)

    Google Scholar 

  28. Chen, L., Liu, Y., Harper, M., Shriberg, E.: Multimodal model integration for sentence unit detection. In: Proc. of Int. Conf. on Multimodal Interface (ICMI), University Park, PA (2004)

    Google Scholar 

  29. Rose, T., Quek, F., Shi, Y.: Macvissta: A system for multimodal analysis. In: Proc. of Int. Conf. on Multimodal Interface (ICMI) (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, L. et al. (2006). VACE Multimodal Meeting Corpus. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_4

Download citation

  • DOI: https://doi.org/10.1007/11677482_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32549-9

  • Online ISBN: 978-3-540-32550-5

  • eBook Packages: Computer ScienceComputer Science (R0)