Advertisement

A Vision-Based Remote Control

  • Björn Stenger
  • Thomas Woodley
  • Roberto Cipolla
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 285)

Abstract

This Chapter presents a vision-based system for touch-free interaction with a display at a distance. A single camera is fixed on top of the screen and is pointing towards the user. An attention mechanism allows the user to start the interaction and control a screen pointer by moving their hand in a fist pose directed at the camera. On-screen items can be chosen by a selection mechanism. Current sample applications include browsing video collections as well as viewing a gallery of 3D objects, which the user can rotate with their hand motion. We have included an up-to-date review of hand tracking methods, and comment on the merits and shortcomings of previous approaches. The proposed tracker uses multiple cues, appearance, color, and motion, for robustness. As the space of possible observation models is generally too large for exhaustive online search, we select models that are suitable for the particular tracking task at hand. During a training stage, various off-the-shelf trackers are evaluated. From this data differentmethods of fusing them online are investigated, including parallel and cascaded tracker evaluation. For the case of fist tracking, combining a small number of observers in a cascade results in an efficient algorithm that is used in our gesture interface. The system has been on public display at conferences where over a hundred users have engaged with it.

Keywords

Linear Discriminant Analysis Gesture Recognition Hand Gesture Motion Blur Normalize Cross Correla 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Argyros, A.A., Lourakis, M.I.A.: Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 368–379. Springer, Heidelberg (2004)Google Scholar
  2. 2.
    Argyros, A.A., Lourakis, M.I.A.: Vision-based interpretation of hand gestures for remote control of a computer mouse. In: Huang, T.S., Sebe, N., Lew, M., Pavlović, V., Kölsch, M., Galata, A., Kisačanin, B. (eds.) ECCV 2006 Workshop on HCI. LNCS, vol. 3979, pp. 40–51. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Athitsos, V., Alon, J., Sclaroff, S., Kollios, G.: Boostmap: A method for efficient approximate similarity rankings. Boston University Computer Science Technical Report No. 2003-023 (2003)Google Scholar
  4. 4.
    Avidan, S.: Support vector tracking. IEEE Transaction Pattern on Analysis and Machine Intelligence 26(8), 1064–1072 (2004)CrossRefGoogle Scholar
  5. 5.
    Avidan, S.: Ensemble tracking. IEEE Transaction Pattern on Analysis and Machine Intelligence 29(2), 261–271 (2007)CrossRefGoogle Scholar
  6. 6.
    Badrinarayanan, V., Pérez, P., Le Clerc, F., Oisel, L.: Probabilistic color and adaptive multi-feature tracking with dynamically switched priority between cues. In: Proceedings of the International Conference on Computer Vision (2007)Google Scholar
  7. 7.
    Billinghurst, M., Kato, H., Poupyrev, I.: The MagicBook - moving seamlessly between reality and virtuality. IEEE Computer Graphics & Applications 21(3), 6–8 (2001)Google Scholar
  8. 8.
    Birchfield, S.: KLT: An implementation of the Kanade-Lucas-Tomasi feature tracker, http://www.ces.clemson.edu/~stb/klt/
  9. 9.
    Birchfield, S.: Elliptical head tracking using intensity gradients and color histograms. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 232–237 (1998)Google Scholar
  10. 10.
    Black, M.J., Jepson, A.: Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. In: Buxton, B.F., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1065, pp. 329–342. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  11. 11.
    Bretzner, L., Laptev, I., Lindeberg, T.: Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering. In: Proceedings of the International Conference on Face and Gesture, pp. 423–428 (2002)Google Scholar
  12. 12.
    Buehler, P., Everingham, M., Huttenlocher, D.P., Zisserman, A.: Long term arm and hand tracking for continuous sign language tv broadcasts. In: Proceedings of the British Machine Vision Conference (2008)Google Scholar
  13. 13.
    de Campos, T.E., Murray, D.W.: Regression-based hand pose estimation from multiple cameras. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  14. 14.
    Canesta, http://canesta.com (Accessed on October 19, 2009)
  15. 15.
    Cipolla, R., Hadfield, P.A., Hollinghurst, N.J.: Uncalibrated stereo vision with pointing for a man-machine interface. In: Proceedings of the IAPR Workshop on Machine Vision Applications, pp. 163–166 (1994)Google Scholar
  16. 16.
    Cipolla, R., Hollinghurst, N.J.: Human-robot interface by pointing with uncalibrated stereo vision. Image and Vision Computing 14(3), 171–178 (1996)CrossRefGoogle Scholar
  17. 17.
    Collins, R.T., Liu, Y., Leordeanu, M.: Online selection of discriminative tracking features. Transaction on Pattern Analysis and Machine Intelligence 27(10), 1631–1643 (2005)CrossRefGoogle Scholar
  18. 18.
    Collins, R.T., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: Proceedings of the International Workshop on Performance Evaluation of Tracking and Surveillance (2005)Google Scholar
  19. 19.
    Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. Pattern Analysis and Machine Intelligence 25(5), 564–575 (2003)CrossRefGoogle Scholar
  20. 20.
    Cooper, H.M., Bowden, R.: Large lexicon detection of sign language. In: Lew, M., Sebe, N., Huang, T.S., Bakker, E.M. (eds.) HCI 2007. LNCS, vol. 4796, pp. 88–97. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Delamarre, Q., Faugeras, O.D.: Finding pose of hand in video images: a stereo-based approach. In: Proceedings of the International Conference on Automatic Face and Gesture Recogntion, pp. 585–590 (1998)Google Scholar
  22. 22.
    Doucet, A., de Freitas, N.G., Gordon, N.J. (eds.): Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  23. 23.
    Du, W., Piater, J.: A probabilistic approach to integrating multiple cues in visual tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 225–238. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  24. 24.
    Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: A review. Computer Vision and Image Understanding - Special Issue on Vision for Human-Computer Interaction 108, 52–73 (2007)Google Scholar
  25. 25.
    EyeToy, http://www.eyetoy.com (Accessed on October 19, 2009)
  26. 26.
    Freeman, W.T., Weissman, C.D.: Television control by hand gestures. In: Proceedings of the International Workshop on Automatic Face and Gesture Recognition (1995)Google Scholar
  27. 27.
    GestureTek, http://www.gesturetek.com/ (Accessed on October 19, 2009)
  28. 28.
    de la Gorce, M., Paragios, N., Fleet, D.: Model-based hand tracking with texture, shading and self-occlusions. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  29. 29.
    Grabner, H., Bischof, H.: On-line boosting and vision. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)Google Scholar
  30. 30.
    Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Graf, H.P., Cosatto, E., Gibbon, D., Kocheisen, M.: Multi-modal system for locating heads and faces. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 88–93 (1996)Google Scholar
  32. 32.
    Guan, H., Chang, J., Chen, L., Feris, R., Turk, M.: Multi-view appearance-based 3d hand pose estimation. In: Proceedings of the International Workshop on Vision for Human Computer Interaction (2006)Google Scholar
  33. 33.
    Hager, G.D., Belhumeur, P.N.: Real-time tracking of image regions with changes in geometry and illumination. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 403–410 (1996)Google Scholar
  34. 34.
    Hamer, H., Schindler, K., Koller-Meier, E., van Gool, L.: Tracking a hand manipulating an object. In: Proceedings of the International Conference on Computer Vision (2009)Google Scholar
  35. 35.
    Heap, A.J., Hogg, D.C.: Towards 3-D hand tracking using a deformable model. In: Proceedings of the International Conference on Face and Gesture Recognition, pp. 140–145 (1996)Google Scholar
  36. 36.
    Huttenlocher, D.P., Noh, J.J., Rucklidge, W.J.: Tracking non-rigid objects in complex scenes. In: Proceedings of the International Conference on Computer Vision, pp. 93–101 (1993)Google Scholar
  37. 37.
    Ike, T., Kishikawa, N., Stenger, B.: A real-time hand gesture interface implemented on a multi-core processor. In: Proceedings of the International Conference on Machine Vision Applications, pp. 9–12 (2007)Google Scholar
  38. 38.
    Ike, T., Kishikawa, N., Stenger, B.: A real-time hand gesture interface implemented on a multi-core processor. In: Proceedings of the International Conference on Machine Vision Applications, pp. 9–12 (2007)Google Scholar
  39. 39.
    Isard, M., Blake, A.: Condensation — conditional density propagation for visual tracking. International Journal of Computer Vision 29(1), 5–28 (1998)CrossRefGoogle Scholar
  40. 40.
    Isard, M., Blake, A.: ICondensation: Unifying low-level and high-level tracking in a stochastic framework. In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. 893–908. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  41. 41.
    Isard, M., Blake, A.: A mixed-state condensation tracker with automatic model-switching. In: Proceedings of the International Conference on Computer Vision, pp. 107–112 (1998)Google Scholar
  42. 42.
    Izadi, S., Agarwal, A., Criminisi, A., Winn, J., Blake, A., Fitzgibbon, A.: C-slate: Exploring remote collaboration on horizontal multi-touch surfaces. In: Proceedings of IEEE Tabletop (2007)Google Scholar
  43. 43.
    Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. International Journal of Computer Vision 46(1), 81–96 (2002)zbMATHCrossRefGoogle Scholar
  44. 44.
    Kaucic, R., Perera, A.G.A., Brooksby, G., Kaufhold, J., Hoogs, A.: A unified framework for tracking through occlusions and sensor gaps. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 990–997 (2005)Google Scholar
  45. 45.
    Kölsch, M., Turk, M.: Fast 2D hand tracking with flocks of features and multi-cue integration. In: Proceedings of the International Workshop on Real-Time Vision for HCI (2004)Google Scholar
  46. 46.
    Kölsch, M., Turk, M.: Robust hand detection. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 614–619 (2004)Google Scholar
  47. 47.
    Krahnstoever, N., Schapira, E., Kettebekov, S., Sharma, R.: Multimodal human-computer interaction for crisis management systems. In: Proceedings of the International Workshop on Applications of Computer Vision, pp. 203–207 (2002)Google Scholar
  48. 48.
    Leichter, I., Lindenbaum, M., Rivlin, E.: A generalized framework for combining visual trackers – the black boxes approach. International Journal of Computer Vision 67(2), 91–110 (2006)Google Scholar
  49. 49.
    Li, Y., Ai, H., Yamashita, T., Lao, S., Kawade, M.: Tracking in low frame rate video: A cascade particle filter with discriminative observers of different lifespans. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  50. 50.
    Lockton, R., Fitzgibbon, A.W.: Real-time gesture recognition using deterministic boosting. In: Proceedings of the British Machine Vision Conference, vol. II, pp. 817–826 (2002)Google Scholar
  51. 51.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)Google Scholar
  52. 52.
    MacCormick, J., Isard, M.: Partitioned sampling, articulated objects, and interface-quality hand tracking. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 3–19. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  53. 53.
    Microsoft Surface, http://www.microsoft.com/surface/ (Accessed on October 19, 2009)
  54. 54.
    Mita, T., Kaneko, T., Stenger, B., Hori, O.: Discriminative feature co-occurrence selection for object detection. Transaction on Pattern Analysis and Machine Intelligence 30(7), 1257–1269 (2008)CrossRefGoogle Scholar
  55. 55.
    Moreno-Noguer, F., Sanfeliu, A., Samaras, D.: Dependent multiple cue integration for robust tracking. Transaction on Pattern Analysis and Machine Intelligence 30(4), 670–685 (2008)CrossRefGoogle Scholar
  56. 56.
    Nintendo Wii, http://www.nintendo.com/wii (Accessed on October 19 , 2009)
  57. 57.
    Oblong Industries, http://oblong.com/ (Accessed on October 19, 2009)
  58. 58.
    Oka, K., Sato, Y., Koike, H.: Real-time fingertip tracking and gesture recognition. Computer Graphics and Applications 22(6), 64–71 (2002)CrossRefGoogle Scholar
  59. 59.
    Okuma, K., Taleghani, A., de Freitas, N., Little, J.J., Lowe, D.G.: A boosted particle filter: Multitarget detection and tracking. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 28–39. Springer, Heidelberg (2004)Google Scholar
  60. 60.
    Ong, E.J., Bowden, R.: A boosted classifier tree for hand shape detection. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 889–894 (2004)Google Scholar
  61. 61.
    Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: A survey and the future beyond lexical meaning. Transaction on Pattern Analysis and Machine Intelligence 27(6), 873–891 (2005)CrossRefGoogle Scholar
  62. 62.
    Pavlović, V., Sharma, R., Huang, T.: Visual interpretation of hand gestures for human-computer interaction: A review. Transaction on Pattern Analysis and Machine Intelligence 19(7), 677–695 (1997)CrossRefGoogle Scholar
  63. 63.
    Pérez, P., Vermaak, J., Blake, A.: Data fusion for visual tracking with particles. Proceedings of the IEEE 92(3), 495–513 (2004)CrossRefGoogle Scholar
  64. 64.
    Playstation Eye, http://www.us.playstation.com/ps3/accessories/scph-98047 (Accessed on October 19, 2009)
  65. 65.
    Project Natal, http://www.xbox.com/en-us/live/projectnatal/ (Accessed on October 19, 2009)
  66. 66.
    Rehg, J.M.: Visual analysis of high dof articulated objects with application to hand tracking. Ph.D. thesis, Carnegie Mellon University, Dept. of Electrical and Computer Engineering (1995)Google Scholar
  67. 67.
    Robertson, P., Laddaga, R., Van Kleek, M.: Virtual mouse vision based interface. In: Proceedings of the International Conference on Intelligent User Interfaces, pp. 177–183 (2004)Google Scholar
  68. 68.
    Shimada, N., Kimura, K., Shirai, Y.: Real-time 3-D hand posture estimation based on 2-D appearance retrieval using monocular camera. In: Proceedings of the International Workshop RATFG-RTS, pp. 23–30 (2001)Google Scholar
  69. 69.
    Starner, T., Weaver, J., Pentland, A.: Real-time American Sign Language recognition using desk and wearable computer-based video. IEEE Transaction on Pattern Analysis and Machine Intelligence 20(12), 1371–1375 (1998)CrossRefGoogle Scholar
  70. 70.
    Stefanov, N., Galata, A., Hubbold, R.: Real-time hand tracker using variable-length markov models of behaviour. Computer Vision and Image Understanding 108(1-2), 98–115 (2007)CrossRefGoogle Scholar
  71. 71.
    Stenger, B.: Template-based hand pose recognition using multiple cues. In: Narayanan, P.J., Nayar, S.K., Shum, H.-Y. (eds.) ACCV 2006. LNCS, vol. 3852, pp. 551–560. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  72. 72.
    Stenger, B., Thayananthan, A., Torr, P.H.S., Cipolla, R.: Model-based hand tracking using a hierarchical bayesian filter. Transaction on Pattern Analysis and Machine Intelligence 28(9), 1372–1384 (2006)CrossRefGoogle Scholar
  73. 73.
    Stenger, B., Woodley, T., Cipolla, R.: Learning to track with multiple observers. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  74. 74.
    Stenger, B., Woodley, T., Kim, T.K., Hernández, C., Cipolla, R.: AIDIA: adaptive interface for display interaction. In: Proceedings of the British Machine Vision Conference (2008)Google Scholar
  75. 75.
    Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CMU-CS-91-132, Carnegie Mellon University (1991)Google Scholar
  76. 76.
    Tosas, M.: Visual articulated hand tracking for interactive surfaces. Ph.D. thesis, University of Nottingham (2006)Google Scholar
  77. 77.
    Toshiba Qosmio Press Release, http://laptops.toshiba.com/pressrelease/423413 (Accessed on October 19, 2009)
  78. 78.
    Triesch, J., von der Malsburg, C.: A system for person-independent hand posture recognition against complex backgrounds. IEEE Transaction on Pattern Analysis and Machine Intelligence 23(12), 1449–1453 (2001)CrossRefGoogle Scholar
  79. 79.
    Ueda, N., Mase, K.: Tracking moving contours using energy-minimizing elastic contour models. In: Sandini, G. (ed.) ECCV 1992. LNCS, vol. 588, pp. 453–457. Springer, Heidelberg (1992)Google Scholar
  80. 80.
    Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. In: Proceedings of International Conference on Computer Vision (2007)Google Scholar
  81. 81.
    Viola, P., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)CrossRefGoogle Scholar
  82. 82.
    Wang, R.Y., Popović, J.: Real-time hand-tracking with a color glove. ACM Transactions on Graphics 28(3) (2009)Google Scholar
  83. 83.
    Wellner, P.: Interacting with paper on the digitaldesk. Communications of the ACM 36(7), 87–96 (1993)CrossRefGoogle Scholar
  84. 84.
    Williams, O., Blake, A., Cipolla, R.: Sparse Bayesian learning for efficient visual tracking. Transaction on Pattern Analysis and Machine Intelligence 27, 1292–1304 (2005)CrossRefGoogle Scholar
  85. 85.
    Woodfill, J., Zabih, R.D.: An algorithm for real-time tracking of non-rigid objects. In: Proceedings of the American Association for Artificial Intelligence (1991)Google Scholar
  86. 86.
    Wu, Y., Huang, T.S.: Vision-based gesture recognition: A review. In: Braffort, A., Gibet, S., Teil, D., Gherbi, R., Richardson, J. (eds.) GW 1999. LNCS (LNAI), vol. 1739, pp. 103–116. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  87. 87.
    Wu, Y., Huang, T.S.: View-independent recognition of hand postures. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, pp. 88–94 (2000)Google Scholar
  88. 88.
    Wu, Y., Huang, T.S.: Human hand modeling, analysis and animation in the context of human computer interaction. IEEE Signal Processing Magazine, Special issue on Immersive Interactive Technology 18(3), 51–60 (2001)Google Scholar
  89. 89.
    Wu, Y., Lin, J.Y., Huang, T.S.: Capturing natural hand articulation. In: Proceedings of the International Conference on Computer Vision, pp. 426–432 (2001)Google Scholar
  90. 90.
    Zhou, H., Huang, T.S.: Tracking articulated hand motion with eigen-dynamics analysis. In: Proceedings of the International Conference on Computer Vision, pp. 1102–1109 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Björn Stenger
    • 1
  • Thomas Woodley
    • 2
  • Roberto Cipolla
    • 2
  1. 1.Computer Vision Group, Toshiba Research EuropeCambridgeUK
  2. 2.Department of EngineeringUniversity of CambridgeUK

Personalised recommendations