Multimedia Tools and Applications

, Volume 69, Issue 1, pp 53–77 | Cite as

A multimedia presentation system using a 3D gesture interface in museums

  • Fu-Song Hsu
  • Wei-Yang Lin


Multimedia presentations have become an indispensable feature of museum exhibits in recent years. Advances in technology have increased the relevance of studying digital communication using computational devices. Devices, such as multi-touch screens and cameras, are essential for natural communication, and obvious applications involve entertainment to attract users. This study focused on the use of cameras to support natural interaction of visitors during museum presentations. We first outlined a platform called the “U-Garden,” comprising a set of tools to assist application designers in developing movement-based projects that employ camera tracking. We then established a rationale with which to base the design of such presentation tools. This system supplies interactive power to natural interaction based on depth image streams, and provides tracking results to designers for producing numerous fascinating applications that appeal to more diverse interactive imaginations.


Interactive system design Interactive multimedia presentation Depth image 



This work was partially supported by the National Taiwan Museum of Fine Arts and funded by the Interactive Media Art Workshop. A number of individuals made substantial contributions to improving this paper. The authors would like to express our gratitude for the many interesting discussions provided by Professor Tzu-Wei Tsai. We would also like to thank Professor Geeng-Neng You and Hsiu-Mei Huang for their useful suggestions and assistance on the previous version. The development work for motion sensing, we thank Jia-Fen Hung.


  1. 1.
    Adobe Flash. Available at
  2. 2.
    Aran O, Akarun L (2008) Multi-class classification strategies for Fisher scores of gesture and sign sequences. In: Proceedings of the 19th International Conference on Pattern Recognition, pp 1–4Google Scholar
  3. 3.
    Aran O, Akarun L (2010) A multi-class classification strategy for Fisher scores: application to signer independent sign language recognition. Pattern Recog 43(5):1776–1788CrossRefMATHGoogle Scholar
  4. 4.
    Aran O, Ari I, Akarun L, Sankur B, Benoit A, Caplier A, Campr P, Carrillo AH, Fanard FX (2009) SignTutor: an interactive system for sign language tutoring. IEEE Multimed 16(1):81–93CrossRefGoogle Scholar
  5. 5.
    Belkin M, Niyogi P (2003) Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396CrossRefMATHGoogle Scholar
  6. 6.
    Böhme M, Haker M, Martinetz T, Barth E (2008) A facial feature tracker for human-computer interaction based on 3D TOF cameras. Int J Intell Syst Technol Appl 5(3/4):264–273Google Scholar
  7. 7.
    Bordegoni M, Faconti G, Feiner S, Maybury MT, Rist T, Ruggieri S, Trahanias P, Wilson M (1997) A standard reference model for intelligent multimedia presentation systems. Comput Stand Interfaces 18(6–7):477–496CrossRefGoogle Scholar
  8. 8.
    Chai D, Ngan KN (1999) Face segmentation using skin-color map in videophone applications. IEEE Trans Circ Syst Video Technol 9(4):551–564CrossRefGoogle Scholar
  9. 9.
    Chen P-Y, Shih C-H, Chen T-H, Hsieh M-H, Chang W-C, Hsu F-S (2008) Shadow puppetry. In: Interactive media art workshop achievement exhibition. Taichung, TaiwanGoogle Scholar
  10. 10.
    Chiu C-C, Ku M-Y, Liang L-W (2010) A robust object segmentation system using a probability-based background extraction algorithm. IEEE Trans Circ Syst Video Technol 20(4):518–528CrossRefGoogle Scholar
  11. 11.
    Chu W-T, Tsai W-H (2010) Modeling spatiotemporal relationships between moving objects for event tactics analysis in tennis videos. Multimed Tools Appl 50(1):149–171CrossRefGoogle Scholar
  12. 12.
    Chu W-T, Wu J-L (2008) Explicit semantic events detection and development of realistic applications for broadcasting baseball videos. Multimed Tools Appl 38(1):27–50CrossRefMathSciNetGoogle Scholar
  13. 13.
    de la Hamette P, Tröster G (2008) Architecture and applications of the FingerMouse: a smart stereo camera for wearable computing HCI. Pers Ubiquit Comput 12(2):97–110CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Dumas B, Lalanne D, Oviatt S (2009) Multimodal interfaces: a survey of principles, models and frameworks. In: Human machine interaction, LNCS 5440, Springer Berlin/Heidelberg, pp 3–26Google Scholar
  16. 16.
    Eisenberg M, Elumeze N, Buechley L, Blauvelt G, Hendrix S, Eisenberg A (2005) The homespun museum: computers, fabrication, and the design of personalized exhibits. In: Proceedings of the 5th conference on Creativity & cognition, London, United Kingdom, pp 13–21Google Scholar
  17. 17.
    Eriksson E, Hansen T, Lykke-Olesen A (2007) Movement-based interaction in camera spaces: a conceptual framework. Pers Ubiquit Comput 11(8):621–632CrossRefGoogle Scholar
  18. 18.
    Grammenos D, Zabulis X, Michel D, Sarmis T, Georgalis G, Tzevanidis K, Argyros A, Stephanidis C (2010) Design and development of four prototype interactive edutainment exhibits for museums. In: Proceedings of the 6th international conference on Universal access in human-computer interaction: context diversity–volume part III, Orlando, FL, pp 173–182Google Scholar
  19. 19.
    Haker M, Böhme M, Martinetz T, Barth E (2009) Self-organizing maps for pose estimation with a time-of-flight camera. In: Dynamic 3D imaging, LNCS 5742, Springer Berlin/Heidelberg, pp 142–153Google Scholar
  20. 20.
    Haque M, Murshed M, Paul M (2008) On stable dynamic background generation technique using Gaussian mixture models for robust object detection. In: Proceedings of the IEEE 5th International Conference on Advanced Video and Signal Based Surveillance, pp 41–48Google Scholar
  21. 21.
    Ho I-J, Chen T-S, Cheug C-Y (2002) An efficient face detection method using skin-color discovering and chain code. Mach Graph Vis Int J 11(2/3):241–256Google Scholar
  22. 22.
    Hornecker E, Stifter M (2006) Learning from interactive museum installations about interaction design for public settings. In: Proceedings of the 18th Australia conference on Computer-Human Interaction: Design: Activities, Artefacts and Environments, Sydney, Australia, pp 135–142Google Scholar
  23. 23.
    Hsu F-S, Lin W-Y, You G-N, Tsai T-W, Huang H-M (2010) U-garden: an interactive control system for multimodal presentation in museum. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp 1016–1021Google Scholar
  24. 24.
    Huang C-R, Chen C-S, Chung P-C (2005) Tangible photorealistic virtual museum. IEEE Comput Graph Appl 25(1):15–17CrossRefGoogle Scholar
  25. 25.
    Huang H-M, Chen C-Y, Wang S-L, Hsu F-S (2008) Designing an intelligent interactive system for digital presentations. In: Proceedings of the International Conference on Business and Information Management, LinKou, TaiwanGoogle Scholar
  26. 26.
    Huang H-M, You G-N, Yu P-T, Hsu F-S, Liaw S-S (2007) On the development of a human-oriented interaction system for multimedia services. In: Proceedings of the World Conference on Educational Multimedia, Hypermedia and Telecommunications, Vancouver, Canada, AACE, pp 951–956Google Scholar
  27. 27.
    Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872CrossRefGoogle Scholar
  28. 28.
    Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1480–1481CrossRefGoogle Scholar
  29. 29.
    Kopp S, Wachsmuth I, Haker M, Böhme M, Martinetz T, Barth E (2009) Deictic gestures with a time-of-flight camera. In: Proceedings of the Gesture in Embodied Communication and Human-Computer Interaction, LNCS 5934, Springer Berlin/Heidelberg, pp 110–121Google Scholar
  30. 30.
    Ku W-Y, Lin H-J, Wang S-W, Yu H-M, Hsu F-S (2008) The evil mirror. In: Interactive media art workshop achievement exhibition. Taichung, TaiwanGoogle Scholar
  31. 31.
    Liao H-H, Chang J-Y, Chen L-G (2008) A localized approach to abandoned luggage detection with foreground-mask sampling. In: Proceedings of the 5th International Conference on Advanced Video and Signal Based Surveillance, pp 132–139Google Scholar
  32. 32.
    Licsar A, Szirányi T, Kovács L, Pataki B (2006) Tillarom: an AJAX based folk song search and retrieval system with gesture interface based on kodály hand. In: Proceedings of the 1st ACM international workshop on Human-centered multimedia, Santa Barbara, California, USA, pp 81–88Google Scholar
  33. 33.
    Licsar A, Szirányi T, Kovács L, Pataki B (2009) A folk song retrieval system with a gesture-based interface. IEEE Multimed 16(3):48–59CrossRefGoogle Scholar
  34. 34.
    Lu C-S, Cho I-J, Tsai C-L, Hsu F-S (2008) Chinese Sichuan Opera. In: Interactive media art workshop achievement exhibition. Taichung, TaiwanGoogle Scholar
  35. 35.
    Menon V, Jayaraman B, Govindaraju V (2010) Multimodal identification and tracking in smart environments. Pers Ubiquit Comput 14(8):685–694CrossRefGoogle Scholar
  36. 36.
    Oliver NM, Rosario B, Pentland AP (2000) A Bayesian computer vision system for modeling human interactions. IEEE Trans Pattern Anal Mach Intell 22(8):831–843CrossRefGoogle Scholar
  37. 37.
    Oviatt S (1999) Ten myths of multimodal interaction. Commun ACM 42(11):74–81CrossRefGoogle Scholar
  38. 38.
    Rousseau C, Bellik Y, Vernier F, Bazalgette D (2006) A framework for the intelligent multimodal presentation of information. Signal Process 86(12):3696–3713CrossRefMATHGoogle Scholar
  39. 39.
    Screven CG (1999) Information design in informal setting: museum and other publish spaces. In: Jacobson R (ed) Information design. MIT Press, pp 131–192Google Scholar
  40. 40.
    Singh A, Sawan S, Hanmandlu M, Madasu VK, Lovell BC (2009) An abandoned object detection system based on dual background segmentation. In: Proceedings of the 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp 352–357Google Scholar
  41. 41.
    Supplementary Data. Available at
  42. 42.
    Tennenhouse D (2000) Proactive computing. Commun ACM 43(5):43–50CrossRefGoogle Scholar
  43. 43.
    Tsai T-W, Tsai I-C (2009) Aesthetic experience of proactive interaction with cultural art. Int J Arts Technol 2:94–111CrossRefGoogle Scholar
  44. 44.
    Tsai T-W, Tsai I-C, You G-N, Hsu F-S, Chen K-S (2006) Pleasurably experiencing arts with attentive interactive system. In: Proceedings of the International Conference for Universal Design, Kyoto, JapanGoogle Scholar
  45. 45.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Hawaii, USA, pp 511-I-518Google Scholar
  46. 46.
    Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  47. 47.
    Weiming H, Tieniu T, Liang W, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst Man Cybern Part C Appl Rev 34(3):334–352CrossRefGoogle Scholar
  48. 48.
    Zabulis X, Grammenos D, Sarmis T, Tzevanidis K, Argyros A (2010) Exploration of large-scale museum artifacts through non-instrumented, location-based, multi-user interaction. In: Proceedings of the International Symposium on Virtual Reality, Archaeology and Cultural Heritage (VAST), pp 155–162Google Scholar
  49. 49.
    Zhou J, Hoang J (2005) Real time robust human detection and tracking system. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 149–149Google Scholar
  50. 50.
    Ziani A, Motamed C, Noyer JC (2008) Temporal reasoning for scenario recognition in video-surveillance using Bayesian networks. Comp Vision IET 2(2):99–107CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Computer Science and Information EngineeringNational Chung Cheng UniversityChiayiTaiwan

Personalised recommendations