Skip to main content

Smart Posterboard: Multi-modal Sensing and Analysis of Poster Conversations

  • Chapter
  • First Online:

Abstract

Conversations in poster sessions in academic events, referred to as poster conversations, pose interesting and challenging topics on multi-modal multi-party interactions. This article gives an overview of our CREST project on the smart posterboard for multi-modal conversation analysis. The smart posterboard has multiple sensing devices to record poster conversations, so we can review who came to the poster and what kind of questions or comments he/she made. The conversation analysis combines speech and image processing such as face and eye-gaze tracking, speech enhancement and speaker diarization. It is shown that eye-gaze information is useful for predicting turn-taking and also improving speaker diarization. Moreover, high-level indexing of interest and comprehension level of the audience is explored based on the multi-modal behaviors during the conversation. This is realized by predicting the audience’s speech acts such as questions and reactive tokens.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We used different Japanese wording for interest and for surprise to enhance the reliability of the evaluation; we adopt the result if the two matches.

  2. 2.

    This does not mean the presenter actually answered simply by “Yes” or “No”.

References

  1. S. Renals, T. Hain, H. Bourlard, Recognition and understanding of meetings: The AMI and AMIDA projects. Proceedings of IEEE Workshop Automatic Speech Recognition & Understanding (2007)

    Google Scholar 

  2. K. Ohtsuka, Conversation scene analysis. Signal Process. Magaz. 28(4), 127–131 (2011)

    Google Scholar 

  3. T. Kawahara, Multi-modal sensing and analysis of poster conversations toward smart posterboard. In Proceedings of SIGdial Meeting Discourse and Dialogue, pp. 1–9 (keynote speech) (2012)

    Google Scholar 

  4. T. Kawahara, Smart posterboard: Multi-modal sensing and analysis of poster conversations. In Proceedings of APSIPA ASC, page (plenary overview talk) (2013)

    Google Scholar 

  5. T. Kawahara, H.Setoguchi, K. Takanashi, K.Ishizuka, S. Araki, Multi-modal recording, analysis and indexing of poster sessions. Proceedings of INTERSPEECH, pp. 1622–1625 (2008)

    Google Scholar 

  6. K. Maekawa, Corpus of spontaneous Japanese: its design and evaluation. Proceedings of ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition, pp. 7–12 (2003)

    Google Scholar 

  7. H. Yoshimoto, Y. Nakamura, Cubistic representation for real-time 3D shape and pose estimation of unknown rigid object. Proceedings ICCV, Workshop, pp. 522–529 (2013)

    Google Scholar 

  8. Y. Takahashi, T. Takatani, K. Osako, H. Saruwatari, K. Shikano, Blind spatial subtraction array for speech enhancement in noisy environment. IEEE Trans. Audio, Speech Language Process. 17(4), 650–664 (2009)

    Google Scholar 

  9. T. Ohsuga, M. Nishida, Y. Horiuchi, A. Ichikawa, Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue. Proceedings INTERSPEECH, pp. 33–36 (2005)

    Google Scholar 

  10. C.T. Ishi, H. Ishiguro, N. Hagita, Analysis of prosodic and linguistic cues of phrase finals for turn-taking and dialog acts. Proceedings of INTERSPEECH, pp. 2006–2009 (2006)

    Google Scholar 

  11. N.G. Ward, Y.A. Bayyari, A case study in the identification of prosodic cues to turn-taking: back-channeling in Arabic. Proceedings of INTERSPEECH, pp. 2018–2021 (2006)

    Google Scholar 

  12. B. Xiao, V. Rozgic, A. Katsamanis, B.R. Baucom, P.G. Georgiou, S. Narayanan, Acoustic and visual cues of turn-taking dynamics in dyadic interactions. Proceedings of INTERSPEECH, pp. 2441–2444 (2011)

    Google Scholar 

  13. R. Sato, R. Higashinaka, M. Tamoto, M. Nakano, K. Aikawa, Learning decision trees to determine turn-taking by spoken dialogue systems. Proceedings of ICSLP, pp. 861–864 (2002)

    Google Scholar 

  14. D. Schlangen, From reaction to prediction: experiments with computational models of turn-taking. Proceedings INTERSPEECH, pp. 2010–2013 (2006)

    Google Scholar 

  15. A. Raux, M. Eskenazi, A finite-state turn-taking model for spoken dialog systems. Proceedings of HLT/NAACL (2009)

    Google Scholar 

  16. N.G. Ward, O. Fuentes, A. Vega, Dialog prediction for a general model of turn-taking. Proceedings of INTERSPEECH, pp. 2662–2665 (2010)

    Google Scholar 

  17. S. Benus, Are we ’in sync’: turn-taking in collaborative dialogues. Proceedings of INTERSPEECH, pp. 2167–2170 (2009)

    Google Scholar 

  18. N. Campbell, S. Scherer, Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity. Proceedings of INTERSPEECH, pp. 2546–2549 (2010)

    Google Scholar 

  19. D. Bohus, E. Horvitz, Models for multiparty engagement in open-world dialog. Proceedings of SIGdial (2009)

    Google Scholar 

  20. S. Fujie, Y. Matsuyama, H. Taniyama, T. Kobayashi, Conversation robot participating in and activating a group communication. Proceedings of INTERSPEECH, pp. 264–267 (2009)

    Google Scholar 

  21. K. Laskowski, J. Edlund, M. Heldner, A single-port non-parametric model of turn-taking in multi-party conversation. Proceedings of ICASSP, pp. 5600–5603 (2011)

    Google Scholar 

  22. K. Jokinen, K. Harada, M. Nishida, S. Yamamoto, Turn-alignment using eye-gaze and speech in conversational interaction. Proceedings of InterSpeech, pp. 2018–2021 (2011)

    Google Scholar 

  23. A. Kendon, Some functions of gaze direction in social interaction. Acta Psychol. 26, 22–63 (1967)

    Article  Google Scholar 

  24. S.E. Tranter, D.A. Reynolds, An overview of automatic speaker diarization systems. IEEE Trans. ASLP 14(5), 1557–1565 (2006)

    Google Scholar 

  25. G. Friedland, A. Janin, D. Imseng, X. Anguera Miro, L. Gottlieb, M. Huijbregts, M.T. Knox, O. Vinyals, The ICSI RT-09 speaker diarization system. IEEE Trans. ASLP 20(2), 371–381 (2012)

    Google Scholar 

  26. R. Schmidt, Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)

    Article  Google Scholar 

  27. K. Yamamoto, F. Asano, T. Yamada, N. Kitawaki, Detection of overlapping speech in meetings using support vector machines and support vector regression. IEICE Trans. E89-A(8), 2158–2165 (2006)

    Google Scholar 

  28. H. Misra, H. Bourlard, V. Tyagi, New entropy based combination rules in hmm/ann multi-stream asr. Proc. ICASSP 2, 741–744 (2003)

    Google Scholar 

  29. S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, S. Makino, A DOA based speaker diarization system for real meetings. Prooceedings of HSCMA, pp. 29–32 (2008)

    Google Scholar 

  30. Y. Wakabayashi, K. Inoue, H. Yoshimoto, T. Kawahara, Speaker diarization based on audio-visual integration for smart posterboard. Proceedings of APSIPA ASC (2014)

    Google Scholar 

  31. J.G. Fiscus, J. Ajot, M. Michel, J.S. Garofolo, The Rich Transcription 2006 Spring Meeting Recognition Evaluation (Springer, 2006)

    Google Scholar 

  32. H. Koiso, Y. Horiuchi, S. Tutiya, A. Ichikawa, Y. Den, An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language & Speech 41(3–4), 295–321 (1998)

    Google Scholar 

  33. N. Ward, W. Tsukahara, Prosodic features which cue back-channel responses in English and Japanese. J. Pragmatics 32(8), 1177–1207 (2000)

    Article  Google Scholar 

  34. N. Kitaoka, M. Takeuchi, R. Nishimura, S. Nakagawa, Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems. J. Japn. Soc. Artific. Intell. 20(3), 220–228 (2005)

    Google Scholar 

  35. D. Ozkan, L.-P. Morency, Modeling wisdom of crowds using latent mixture of discriminative experts. Proceedings of ACL/HLT (2011)

    Google Scholar 

  36. L.S.Kennedy, D.P.W. Ellis, Laughter detection in meetings. NIST Meeting Recognition Workshop (2004)

    Google Scholar 

  37. K.P. Truong, D.A. van Leeuwen, Automatic detection of laughter. Proceedings InterSpeech, pp. 485–488 (2005)

    Google Scholar 

  38. K.Laskowski, Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings. Proceedings of IEEE-ICASSP, pp. 4765–4768 (2009)

    Google Scholar 

  39. N. Ward, Pragmatic functions of prosodic features in non-lexical utterances. Speech Prosody, pp. 325–328 (2004)

    Google Scholar 

  40. F. Yang, G. Tur, E. Shriberg, Exploiting dialog act tagging and prosodic information for action item identification. Proceedings of IEEE-ICASSP, pp. 4941–4944 (2008)

    Google Scholar 

  41. A. Gravano, S. Benus, J. Hirschberg, S. Mitchell, I. Vovsha, Classification of discourse functions of affirmative words in spoken dialogue. Proceedings of InterSpeech, pp. 1613–1616 (2007)

    Google Scholar 

  42. K. Sumi, T. Kawahara, J. Ogata, M. Goto, Acoustic event detection for spotting hot spots in podcasts. Proceedings of INTERSPEECH, pp. 1143–1146 (2009)

    Google Scholar 

  43. M. Goto, K. Itou, S. Hayamizu, A real-time filled pause detection system for spontaneous speech recognition research. Proceedings of EuroSpeech, pp. 227–230 (1999)

    Google Scholar 

  44. T. Kawahara, Z.Q. Chang, K. Takanashi, Analysis on prosodic features of Japanese reactive tokens in poster conversations. Proceedings Int’l Conference Speech Prosody (2010)

    Google Scholar 

  45. S. Strombergsson, J. Edlund, D. House, Prosodic measurements and question types in the spontal corpus of Swedish dialogues. Proceedings of InterSpeech (2012)

    Google Scholar 

Download references

Acknowledgments

This work was conducted by the members of the CREST project including Hiromasa Yoshimoto, Tony Tung, Yukoh Wakabayashi, Kouhei Sumi, Zhi-Qiang Chang, Takuma Iwatate, Soichiro Hayashi, Koji Inoue, Katsuya Takanashi (Kyoto University) and Yuji Onuma, Shunsuke Nakai, Ryoichi Miyazaki, Hiroshi Saruwatari (Nara Institute of Science and Technology).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tatsuya Kawahara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Japan

About this chapter

Cite this chapter

Kawahara, T. (2016). Smart Posterboard: Multi-modal Sensing and Analysis of Poster Conversations. In: Nishida, T. (eds) Human-Harmonized Information Technology, Volume 1. Springer, Tokyo. https://doi.org/10.1007/978-4-431-55867-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-4-431-55867-5_9

  • Published:

  • Publisher Name: Springer, Tokyo

  • Print ISBN: 978-4-431-55865-1

  • Online ISBN: 978-4-431-55867-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics