Advertisement

VACE Multimodal Meeting Corpus

  • Lei Chen
  • R. Travis Rose
  • Ying Qiao
  • Irene Kimbara
  • Fey Parrill
  • Haleema Welji
  • Tony Xu Han
  • Jilin Tu
  • Zhongqiang Huang
  • Mary Harper
  • Francis Quek
  • Yingen Xiong
  • David McNeill
  • Ronald Tuttle
  • Thomas Huang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)

Abstract

In this paper, we report on the infrastructure we have developed to support our research on multimodal cues for understanding meetings. With our focus on multimodality, we investigate the interaction among speech, gesture, posture, and gaze in meetings. For this purpose, a high quality multimodal corpus is being produced.

Keywords

Meeting Room Multimodal Interface Speech Recognizer Audio Processing Audio Channel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burger, S., MacLaren, V., Yu, H.: The ISL meeting corpus: The impact of meeting type on speech type. In: Proc. of Int. Conf. on Spoken Language Processing (ICSLP) (2002)Google Scholar
  2. 2.
    Morgan, N., et al.: Meetings about meetings: Research at ICSI on speech in multiparty conversations. In: Proc. of ICASSP, Hong Kong, vol. 4, pp. 740–743 (2003)Google Scholar
  3. 3.
    Garofolo, J., Laprum, C., Michel, M., Stanford, V., Tabassi, E.: The NISTMeeting Room Pilot Corpus. In: Proc. of Language Resource and Evaluation Conference (2004)Google Scholar
  4. 4.
    McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic analysis of multimodal group actions in meetings. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 305–317 (2005)CrossRefGoogle Scholar
  5. 5.
    Schultz, T., Waibel, A., et al.: The ISL meeting room system. In: Proceedings of the Workshop on Hands-Free Speech Communication, Kyoto, Japan (2001)Google Scholar
  6. 6.
    Polzin, T.S., Waibel, A.: Detecting emotions in speech. In: Proc. of the CMC (1998)Google Scholar
  7. 7.
    Stiefelhagen, R.: Tracking focus of attention in meetings. In: Proc. of Int. Conf. on Multimodal Interface (ICMI), Pittsburg, PA (2002)Google Scholar
  8. 8.
    Alfred, D., Renals, S.: Dynamic bayesian networks for meeting structuring. In: Proc. of ICASSP, Montreal, Que, Canada, vol. 5, pp. 629–632 (2004)Google Scholar
  9. 9.
    Gatica-Perez, D., Lathoud, G., McCowan, I., Odobez, J., Moore, D.: Audio-visual speaker tracking with importance particle filters. In: Proc. of Int. Conf. on Image Processing (ICIP), Barcelona, Spain, vol. 3, pp. 25–28 (2003)Google Scholar
  10. 10.
    Renals, S., Ellis, D.: Audio information access from meeting rooms. In: Proc. of ICASSP, Hong Kong, vol. 4, pp. 744–747 (2003)Google Scholar
  11. 11.
    Ajmera, J., Lathoud, G., McCowan, I.: Clustering and segmenting speakers and their locations in meetings. In: Proc. of ICASSP, Montreal, Que, Canada, vol. 1, pp. 605–608 (2004)Google Scholar
  12. 12.
    Moore, D., McCowan, I.: Microphone array speech recognition: Experiments on overlapping speech in meetings. In: Proc. of ICASSP, Hong Kong, vol. 5, pp. 497–500 (2003)Google Scholar
  13. 13.
    Han, T.X., Huang, T.S.: Articulated body tracking using dynamic belief propagation. In: Proc. IEEE International Workshop on Human-Computer Interaction (2005)Google Scholar
  14. 14.
    Tu, J., Huang, T.S.: Online updating appearance generative mixture model for meanshift tracking. In: Proc. of Int. Conf. on Computer Vision (ICCV) (2005)Google Scholar
  15. 15.
    Tu, J., Tao, H., Forsyth, D., Huang, T.S.: Accurate head pose tracking in low resolution video. In: Proc. of Int. Conf. on Computer Vision (ICCV) (2005)Google Scholar
  16. 16.
    Quek, F., Bryll, R., Ma, X.F.: A parallel algorighm for dynamic gesture tracking. In: ICCV Workshop on RATFG-RTS, Gorfu,Greece (1999)Google Scholar
  17. 17.
    Bryll, R.: A Robust Agent-Based Gesture Tracking System. PhD thesis, Wright State University (2004)Google Scholar
  18. 18.
    Quek, F., Bryll, R., Qiao, Y., Rose, T.: Vector coherence mapping: Motion field extraction by exploiting multiple coherences. CVIU special issue on Spatial Coherence in Visual Motion Analysis (Submitted, 2005)Google Scholar
  19. 19.
    Strassel, S., Glenn, M.: Shared linguistic resources for human language technology in the meeting domain. In: Proceedings of ICASSP 2004 Meeting Workshop (2004)Google Scholar
  20. 20.
    Huang, Z., Harper, M.: Speech and non-speech detection in meeting audio for transcription. In: MLMI 2005 NIST RT-05S Workshop (2005)Google Scholar
  21. 21.
    Bird, S., Liberman, M.: Linguistic Annotation: Survey by LDC, http://www.ldc.upenn.edu/annotation/
  22. 22.
    Barras, C., Geoffrois, D., Wu, Z., Liberman, W.: Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication (2001)Google Scholar
  23. 23.
    Boersma, P., Weeninck, D.: Praat, a system for doing phonetics by computer. Technical Report 132, University of Amsterdam, Inst. of Phonetic Sc. (1996)Google Scholar
  24. 24.
    Chen, L., Liu, Y., Harper, M., Maia, E., McRoy, S.: Evaluating factors impacting the accuracy of forced alignments in a multimodal corpus. In: Proc. of Language Resource and Evaluation Conference, Lisbon, Portugal (2004)Google Scholar
  25. 25.
    Sundaram, R., Ganapathiraju, A., Hamaker, J., Picone, J.: ISIP 2000 conversational speech evaluation system. In: Speech Transcription Workshop 2001, College Park, Maryland (2000)Google Scholar
  26. 26.
    Pellom, B.: SONIC: The University of Colorado continuous speech recognizer. Technical Report TR-CSLR-2001-01, University of Colorado (2001)Google Scholar
  27. 27.
    Quek, F., McNeill, D., Rose, T., Shi, Y.: A coding tool for multimodal analysis of meeting video. In: NIST Meeting Room Workshop (2003)Google Scholar
  28. 28.
    Chen, L., Liu, Y., Harper, M., Shriberg, E.: Multimodal model integration for sentence unit detection. In: Proc. of Int. Conf. on Multimodal Interface (ICMI), University Park, PA (2004)Google Scholar
  29. 29.
    Rose, T., Quek, F., Shi, Y.: Macvissta: A system for multimodal analysis. In: Proc. of Int. Conf. on Multimodal Interface (ICMI) (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lei Chen
    • 1
  • R. Travis Rose
    • 2
  • Ying Qiao
    • 2
  • Irene Kimbara
    • 3
  • Fey Parrill
    • 3
  • Haleema Welji
    • 3
  • Tony Xu Han
    • 4
  • Jilin Tu
    • 4
  • Zhongqiang Huang
    • 1
  • Mary Harper
    • 1
  • Francis Quek
    • 2
  • Yingen Xiong
    • 2
  • David McNeill
    • 3
  • Ronald Tuttle
    • 5
  • Thomas Huang
    • 4
  1. 1.School of Electrical EngineeringPurdue UniversityWest LafayetteUSA
  2. 2.CHCI, Department of Computer ScienceVirginia TechBlacksburgUSA
  3. 3.Department of PsychologyUniversity of ChicagoChicagoUSA
  4. 4.Beckman InstituteUniversity of Illinois Urbana ChampaignUrbanaUSA
  5. 5.Air Force Institute of TechnologyDaytonUSA

Personalised recommendations