Skip to main content

Three Dimensional Information Extraction and Applications to Video Analysis

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 287))

Abstract

This chapter explores the idea of extracting three dimensional features from a video, and using such features to aid various video analysis and mining tasks. The use of 3D information in video analysis is scarce in the literature due to the inherent difficulties of such a system. When the only input to the system is a video stream with no previous knowledge of the scene or camera (a typical scenario in video analysis), computing an accurate 3D representation becomes a difficult task; however, several recently proposed methods can be applied to solving the problem efficiently, including simultaneous localization and mapping, structure from motion, and 3D reconstruction. These methods are surveyed and presented in the context of video analysis and demonstrated using videos from TRECVID 2005; their limitations are also discussed. Once an accurate 3D representation of a video is obtained, it can be used to increase the performance and accuracy of existing systems for various video analysis and mining tasks. Advantages of utilizing 3D representation are illustrated using several of these tasks, including shot boundary detection, object recognition, content-based video retrieval, as well as human activity recognition. The chapter concludes with a discussion on limitations of existing 3D methods and future research directions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abd-Almageed, W.: Online, simultaneous shot boundary detection and key frame extraction for sports videos using rank tracing. In: International Conference on Image Processing, pp. 3200–3203 (2008)

    Google Scholar 

  2. Ahanger, G., Little, T.D.C.: A survey of technologies for parsing and indexing digital video. Journal of Visual Communication and Image Representation 7, 28–43 (1996)

    Article  Google Scholar 

  3. Bay, H., Tuytelaars, T., Van Gool, L.J.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Boreczky, J.S., Rowe, L.A.: Comparison of video shot boundary detection techniques. JEI 5(2), 122–128 (1996)

    Google Scholar 

  5. Bradski, G.: The opencv library. Dr. Dobb’s Journal of Software Tools, 120–126 (November 2000)

    Google Scholar 

  6. Castle, R.O., Gawley, D.J., Klein, G., Murray, D.W.: Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras. In: Proc. International Conference on Robotics and Automation, Rome, Italy, April 10-14, pp. 4102–4107 (2007)

    Google Scholar 

  7. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(6), 1052–1067 (2007)

    Article  Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Hoboken (2000)

    Google Scholar 

  9. El Qawasmeh, E., Al Badarneh, A.: A survey of digital video shot boundary detection algorithms. Applied Informatics, 497–502 (2002)

    Google Scholar 

  10. Ewerth, R., Schwalb, M., Freisleben, B.: Using depth features to retrieve monocular video shots. In: International Conference on Image and Video Retrieval, New York, NY, USA, pp. 210–217 (2007)

    Google Scholar 

  11. Fenton, G., Churchill, S., Castle, P.: How useful do athletes find 2d video analysis compared to 3d motion analysis? - a preliminary study (2007), http://eprints.worc.ac.uk/238/

  12. Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  13. Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall, Englewood Cliffs (August 2002)

    Google Scholar 

  14. Gargi, U., Kasturi, R., Strayer, S.H.: Performance characterization of video-shot-change detection methods. CirSysVideo 10(1) (2000)

    Google Scholar 

  15. Haralick, R., Lee, C.n., Ottenberg, K., Nolle, M.: Analysis and solutions of the three point perspective pose estimation problem. International Journal of Computer Vision, 592–598 (1991)

    Google Scholar 

  16. Harris, C., Stephens, M.: A combined corner and edge detection. In: Proceedings of The Fourth Alvey Vision Conference, pp. 147–151 (1988)

    Google Scholar 

  17. Hartley, R., Zisserman, A.: Multiple View Geometry. Cambridge Press, New York (2003)

    Google Scholar 

  18. Hu, M.K.: Visual pattern recognition by moment invariants. IRE Transactions on Information Theory IT-8, 179–187 (1962)

    Google Scholar 

  19. Kellokumpu, V., Zhao, G., Pietikäinen, M.: Human activity recognition using a dynamic texture based method. In: British Machine Vision Conference (2008)

    Google Scholar 

  20. Klein, G., Murray, D.W.: Parallel tracking and mapping for small ar workspaces. In: International Symposium on Mixed Augmented Reality (2007)

    Google Scholar 

  21. Koprinska, I., Carrato, S.: Temporal video segmentation: A survey. Signal Processing: Image Communication 16(5), 477–500 (2001)

    Article  Google Scholar 

  22. Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, pp. 1150–1157 (1999)

    Google Scholar 

  23. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, pp. 674–679 (April 1981)

    Google Scholar 

  24. Luo, Y., Hwang, J.-N.: A comprehensive coarse-to-fine sports video analysis framework to infer 3d parameters of video objects with application to tennis video sequences. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2005)

    Google Scholar 

  25. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)

    Google Scholar 

  26. Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)

    Article  Google Scholar 

  27. Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: Fastslam: A factored solution to the simultaneous localization and mapping problem. In: Proceedings of the AAAI National Conference on Artificial Intelligence, pp. 593–598 (2002)

    Google Scholar 

  28. Montiel, J.M.M., Civera, J., Davison, A.: Unified inverse depth parametrization for monocular slam. In: Proceedings of Robotics: Science and Systems (August 2006)

    Google Scholar 

  29. Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Real time localization and 3d reconstruction. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 363–370. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  30. Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Generic and real-time structure from motion using local bundle adjustment. Image and Vision Computing (2008)

    Google Scholar 

  31. Nelder, J.A., Mead, R.: A simplex method for function minimization. The Computer Journal 7(4), 308–313 (1965)

    MATH  Google Scholar 

  32. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)

    Article  Google Scholar 

  33. Over, P., Ianeva, T., Kraaij, W., Smeaton, A.F.: Trecvid 2005 - an overview. In: TREC Video Retrieval Evaluation Online Proceedings (2006)

    Google Scholar 

  34. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. International Journal of Computer Vision 59(3), 207–232 (2004)

    Article  Google Scholar 

  35. Ribeiro, M.I.: Kalman and extended kalman filters: Concept, derivation and properties (February 2004)

    Google Scholar 

  36. Shi, J., Tomasi, C.: Good features to track. In: International Conference on Computer Vision and Pattern Recognition, pp. 593–600. Springer, Heidelberg (1994)

    Google Scholar 

  37. Sivic, J.: Efficient Visual Search of Images and Videos. PhD thesis, University of Oxford (2006)

    Google Scholar 

  38. Sivic, J., Zisserman, A.: Video google: Efficient visual search of videos. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 127–144. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  39. Sivic, J., Zisserman, A.: Efficient visual search for objects in videos. Proceedings of the IEEE 96(4), 548–566 (2008)

    Article  Google Scholar 

  40. Tola, E., Knorr, S., Imre, E., Alatan, A.A., Sikora, T.: Structure from motion in dynamic scenes with multiple motions. In: 2nd Workshop on Immersive Communication and Broadcast Systems (ICoB 2005), Berlin, Germany (October 2005)

    Google Scholar 

  41. Visser, R., Sebe, N., Bakker, E.: Object recognition for video retrieval. In: International Conference on Image and Video Retrieval, pp. 262–270 (2002)

    Google Scholar 

  42. Wang, C.-C., Thorpe, C., Hebert, M., Thrun, S., Durrant-Whyte, H.: Simultaneous localization, mapping and moving object tracking. The International Journal of Robotics Research 26(6) (June 2007)

    Google Scholar 

  43. Xiong, Z., Radharkishnan, R., Divakaran, A., Rui, Y., Huang, T.S.: A Unified Framework for Video Summarization, Browsing and Retrieval. Elsevier, Amsterdam (2006)

    Google Scholar 

  44. Yuan, J., Wang, H., Xiao, L., Zheng, W., Li, J., Lin, F., Zhang, B.: A formal study of shot boundary detection. IEEE Transaction on Circuit and Systems For Video Technology 17(2), 168–186 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Donate, A., Liu, X. (2010). Three Dimensional Information Extraction and Applications to Video Analysis. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds) Video Search and Mining. Studies in Computational Intelligence, vol 287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12900-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12900-1_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12899-8

  • Online ISBN: 978-3-642-12900-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics