Abstract
This chapter explores the idea of extracting three dimensional features from a video, and using such features to aid various video analysis and mining tasks. The use of 3D information in video analysis is scarce in the literature due to the inherent difficulties of such a system. When the only input to the system is a video stream with no previous knowledge of the scene or camera (a typical scenario in video analysis), computing an accurate 3D representation becomes a difficult task; however, several recently proposed methods can be applied to solving the problem efficiently, including simultaneous localization and mapping, structure from motion, and 3D reconstruction. These methods are surveyed and presented in the context of video analysis and demonstrated using videos from TRECVID 2005; their limitations are also discussed. Once an accurate 3D representation of a video is obtained, it can be used to increase the performance and accuracy of existing systems for various video analysis and mining tasks. Advantages of utilizing 3D representation are illustrated using several of these tasks, including shot boundary detection, object recognition, content-based video retrieval, as well as human activity recognition. The chapter concludes with a discussion on limitations of existing 3D methods and future research directions.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abd-Almageed, W.: Online, simultaneous shot boundary detection and key frame extraction for sports videos using rank tracing. In: International Conference on Image Processing, pp. 3200–3203 (2008)
Ahanger, G., Little, T.D.C.: A survey of technologies for parsing and indexing digital video. Journal of Visual Communication and Image Representation 7, 28–43 (1996)
Bay, H., Tuytelaars, T., Van Gool, L.J.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Boreczky, J.S., Rowe, L.A.: Comparison of video shot boundary detection techniques. JEI 5(2), 122–128 (1996)
Bradski, G.: The opencv library. Dr. Dobb’s Journal of Software Tools, 120–126 (November 2000)
Castle, R.O., Gawley, D.J., Klein, G., Murray, D.W.: Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras. In: Proc. International Conference on Robotics and Automation, Rome, Italy, April 10-14, pp. 4102–4107 (2007)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(6), 1052–1067 (2007)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Hoboken (2000)
El Qawasmeh, E., Al Badarneh, A.: A survey of digital video shot boundary detection algorithms. Applied Informatics, 497–502 (2002)
Ewerth, R., Schwalb, M., Freisleben, B.: Using depth features to retrieve monocular video shots. In: International Conference on Image and Video Retrieval, New York, NY, USA, pp. 210–217 (2007)
Fenton, G., Churchill, S., Castle, P.: How useful do athletes find 2d video analysis compared to 3d motion analysis? - a preliminary study (2007), http://eprints.worc.ac.uk/238/
Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)
Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall, Englewood Cliffs (August 2002)
Gargi, U., Kasturi, R., Strayer, S.H.: Performance characterization of video-shot-change detection methods. CirSysVideo 10(1) (2000)
Haralick, R., Lee, C.n., Ottenberg, K., Nolle, M.: Analysis and solutions of the three point perspective pose estimation problem. International Journal of Computer Vision, 592–598 (1991)
Harris, C., Stephens, M.: A combined corner and edge detection. In: Proceedings of The Fourth Alvey Vision Conference, pp. 147–151 (1988)
Hartley, R., Zisserman, A.: Multiple View Geometry. Cambridge Press, New York (2003)
Hu, M.K.: Visual pattern recognition by moment invariants. IRE Transactions on Information Theory IT-8, 179–187 (1962)
Kellokumpu, V., Zhao, G., Pietikäinen, M.: Human activity recognition using a dynamic texture based method. In: British Machine Vision Conference (2008)
Klein, G., Murray, D.W.: Parallel tracking and mapping for small ar workspaces. In: International Symposium on Mixed Augmented Reality (2007)
Koprinska, I., Carrato, S.: Temporal video segmentation: A survey. Signal Processing: Image Communication 16(5), 477–500 (2001)
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, pp. 1150–1157 (1999)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, pp. 674–679 (April 1981)
Luo, Y., Hwang, J.-N.: A comprehensive coarse-to-fine sports video analysis framework to infer 3d parameters of video objects with application to tennis video sequences. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2005)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)
Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: Fastslam: A factored solution to the simultaneous localization and mapping problem. In: Proceedings of the AAAI National Conference on Artificial Intelligence, pp. 593–598 (2002)
Montiel, J.M.M., Civera, J., Davison, A.: Unified inverse depth parametrization for monocular slam. In: Proceedings of Robotics: Science and Systems (August 2006)
Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Real time localization and 3d reconstruction. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 363–370. IEEE Computer Society, Los Alamitos (2006)
Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Generic and real-time structure from motion using local bundle adjustment. Image and Vision Computing (2008)
Nelder, J.A., Mead, R.: A simplex method for function minimization. The Computer Journal 7(4), 308–313 (1965)
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)
Over, P., Ianeva, T., Kraaij, W., Smeaton, A.F.: Trecvid 2005 - an overview. In: TREC Video Retrieval Evaluation Online Proceedings (2006)
Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. International Journal of Computer Vision 59(3), 207–232 (2004)
Ribeiro, M.I.: Kalman and extended kalman filters: Concept, derivation and properties (February 2004)
Shi, J., Tomasi, C.: Good features to track. In: International Conference on Computer Vision and Pattern Recognition, pp. 593–600. Springer, Heidelberg (1994)
Sivic, J.: Efficient Visual Search of Images and Videos. PhD thesis, University of Oxford (2006)
Sivic, J., Zisserman, A.: Video google: Efficient visual search of videos. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 127–144. Springer, Heidelberg (2006)
Sivic, J., Zisserman, A.: Efficient visual search for objects in videos. Proceedings of the IEEE 96(4), 548–566 (2008)
Tola, E., Knorr, S., Imre, E., Alatan, A.A., Sikora, T.: Structure from motion in dynamic scenes with multiple motions. In: 2nd Workshop on Immersive Communication and Broadcast Systems (ICoB 2005), Berlin, Germany (October 2005)
Visser, R., Sebe, N., Bakker, E.: Object recognition for video retrieval. In: International Conference on Image and Video Retrieval, pp. 262–270 (2002)
Wang, C.-C., Thorpe, C., Hebert, M., Thrun, S., Durrant-Whyte, H.: Simultaneous localization, mapping and moving object tracking. The International Journal of Robotics Research 26(6) (June 2007)
Xiong, Z., Radharkishnan, R., Divakaran, A., Rui, Y., Huang, T.S.: A Unified Framework for Video Summarization, Browsing and Retrieval. Elsevier, Amsterdam (2006)
Yuan, J., Wang, H., Xiao, L., Zheng, W., Li, J., Lin, F., Zhang, B.: A formal study of shot boundary detection. IEEE Transaction on Circuit and Systems For Video Technology 17(2), 168–186 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Donate, A., Liu, X. (2010). Three Dimensional Information Extraction and Applications to Video Analysis. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds) Video Search and Mining. Studies in Computational Intelligence, vol 287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12900-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-12900-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12899-8
Online ISBN: 978-3-642-12900-1
eBook Packages: EngineeringEngineering (R0)