Three Dimensional Information Extraction and Applications to Video Analysis

Donate, Arturo; Liu, Xiuwen

doi:10.1007/978-3-642-12900-1_4

Three Dimensional Information Extraction and Applications to Video Analysis

Arturo Donate⁶ &
Xiuwen Liu⁶

Chapter

930 Accesses
1 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 287))

Abstract

This chapter explores the idea of extracting three dimensional features from a video, and using such features to aid various video analysis and mining tasks. The use of 3D information in video analysis is scarce in the literature due to the inherent difficulties of such a system. When the only input to the system is a video stream with no previous knowledge of the scene or camera (a typical scenario in video analysis), computing an accurate 3D representation becomes a difficult task; however, several recently proposed methods can be applied to solving the problem efficiently, including simultaneous localization and mapping, structure from motion, and 3D reconstruction. These methods are surveyed and presented in the context of video analysis and demonstrated using videos from TRECVID 2005; their limitations are also discussed. Once an accurate 3D representation of a video is obtained, it can be used to increase the performance and accuracy of existing systems for various video analysis and mining tasks. Advantages of utilizing 3D representation are illustrated using several of these tasks, including shot boundary detection, object recognition, content-based video retrieval, as well as human activity recognition. The chapter concludes with a discussion on limitations of existing 3D methods and future research directions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abd-Almageed, W.: Online, simultaneous shot boundary detection and key frame extraction for sports videos using rank tracing. In: International Conference on Image Processing, pp. 3200–3203 (2008)
Google Scholar
Ahanger, G., Little, T.D.C.: A survey of technologies for parsing and indexing digital video. Journal of Visual Communication and Image Representation 7, 28–43 (1996)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.J.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Chapter Google Scholar
Boreczky, J.S., Rowe, L.A.: Comparison of video shot boundary detection techniques. JEI 5(2), 122–128 (1996)
Google Scholar
Bradski, G.: The opencv library. Dr. Dobb’s Journal of Software Tools, 120–126 (November 2000)
Google Scholar
Castle, R.O., Gawley, D.J., Klein, G., Murray, D.W.: Towards simultaneous recognition, localization and mapping for hand-held and wearable cameras. In: Proc. International Conference on Robotics and Automation, Rome, Italy, April 10-14, pp. 4102–4107 (2007)
Google Scholar
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(6), 1052–1067 (2007)
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Hoboken (2000)
Google Scholar
El Qawasmeh, E., Al Badarneh, A.: A survey of digital video shot boundary detection algorithms. Applied Informatics, 497–502 (2002)
Google Scholar
Ewerth, R., Schwalb, M., Freisleben, B.: Using depth features to retrieve monocular video shots. In: International Conference on Image and Video Retrieval, New York, NY, USA, pp. 210–217 (2007)
Google Scholar
Fenton, G., Churchill, S., Castle, P.: How useful do athletes find 2d video analysis compared to 3d motion analysis? - a preliminary study (2007), http://eprints.worc.ac.uk/238/
Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall, Englewood Cliffs (August 2002)
Google Scholar
Gargi, U., Kasturi, R., Strayer, S.H.: Performance characterization of video-shot-change detection methods. CirSysVideo 10(1) (2000)
Google Scholar
Haralick, R., Lee, C.n., Ottenberg, K., Nolle, M.: Analysis and solutions of the three point perspective pose estimation problem. International Journal of Computer Vision, 592–598 (1991)
Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detection. In: Proceedings of The Fourth Alvey Vision Conference, pp. 147–151 (1988)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry. Cambridge Press, New York (2003)
Google Scholar
Hu, M.K.: Visual pattern recognition by moment invariants. IRE Transactions on Information Theory IT-8, 179–187 (1962)
Google Scholar
Kellokumpu, V., Zhao, G., Pietikäinen, M.: Human activity recognition using a dynamic texture based method. In: British Machine Vision Conference (2008)
Google Scholar
Klein, G., Murray, D.W.: Parallel tracking and mapping for small ar workspaces. In: International Symposium on Mixed Augmented Reality (2007)
Google Scholar
Koprinska, I., Carrato, S.: Temporal video segmentation: A survey. Signal Processing: Image Communication 16(5), 477–500 (2001)
Article Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, pp. 1150–1157 (1999)
Google Scholar
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, pp. 674–679 (April 1981)
Google Scholar
Luo, Y., Hwang, J.-N.: A comprehensive coarse-to-fine sports video analysis framework to infer 3d parameters of video objects with application to tennis video sequences. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (2005)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)
Google Scholar
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)
Article Google Scholar
Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: Fastslam: A factored solution to the simultaneous localization and mapping problem. In: Proceedings of the AAAI National Conference on Artificial Intelligence, pp. 593–598 (2002)
Google Scholar
Montiel, J.M.M., Civera, J., Davison, A.: Unified inverse depth parametrization for monocular slam. In: Proceedings of Robotics: Science and Systems (August 2006)
Google Scholar
Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Real time localization and 3d reconstruction. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 363–370. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Mouragnon, E., Lhuillier, M., Dhome, M., Dekeyser, F., Sayd, P.: Generic and real-time structure from motion using local bundle adjustment. Image and Vision Computing (2008)
Google Scholar
Nelder, J.A., Mead, R.: A simplex method for function minimization. The Computer Journal 7(4), 308–313 (1965)
MATH Google Scholar
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002)
Article Google Scholar
Over, P., Ianeva, T., Kraaij, W., Smeaton, A.F.: Trecvid 2005 - an overview. In: TREC Video Retrieval Evaluation Online Proceedings (2006)
Google Scholar
Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. International Journal of Computer Vision 59(3), 207–232 (2004)
Article Google Scholar
Ribeiro, M.I.: Kalman and extended kalman filters: Concept, derivation and properties (February 2004)
Google Scholar
Shi, J., Tomasi, C.: Good features to track. In: International Conference on Computer Vision and Pattern Recognition, pp. 593–600. Springer, Heidelberg (1994)
Google Scholar
Sivic, J.: Efficient Visual Search of Images and Videos. PhD thesis, University of Oxford (2006)
Google Scholar
Sivic, J., Zisserman, A.: Video google: Efficient visual search of videos. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 127–144. Springer, Heidelberg (2006)
Chapter Google Scholar
Sivic, J., Zisserman, A.: Efficient visual search for objects in videos. Proceedings of the IEEE 96(4), 548–566 (2008)
Article Google Scholar
Tola, E., Knorr, S., Imre, E., Alatan, A.A., Sikora, T.: Structure from motion in dynamic scenes with multiple motions. In: 2nd Workshop on Immersive Communication and Broadcast Systems (ICoB 2005), Berlin, Germany (October 2005)
Google Scholar
Visser, R., Sebe, N., Bakker, E.: Object recognition for video retrieval. In: International Conference on Image and Video Retrieval, pp. 262–270 (2002)
Google Scholar
Wang, C.-C., Thorpe, C., Hebert, M., Thrun, S., Durrant-Whyte, H.: Simultaneous localization, mapping and moving object tracking. The International Journal of Robotics Research 26(6) (June 2007)
Google Scholar
Xiong, Z., Radharkishnan, R., Divakaran, A., Rui, Y., Huang, T.S.: A Unified Framework for Video Summarization, Browsing and Retrieval. Elsevier, Amsterdam (2006)
Google Scholar
Yuan, J., Wang, H., Xiao, L., Zheng, W., Li, J., Lin, F., Zhang, B.: A formal study of shot boundary detection. IEEE Transaction on Circuit and Systems For Video Technology 17(2), 168–186 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Florida State University, Tallahassee, FL, 32312
Arturo Donate & Xiuwen Liu

Authors

Arturo Donate
View author publications
You can also search for this author in PubMed Google Scholar
Xiuwen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Multimedia Communications Laboratory Department of Electrical & Computer Engineering, University of Illinois at Chicago, Room 1020 SEO (M/C 154), 851 South Morgan Street, 60607-7053, Chicago, IL, USA
Dan Schonfeld
Philips Research, High-Tech Campus 36, 5656, Eindhoven, AE, The Netherlands
Caifeng Shan
Department of Computing, Hong Kong Polytechnic University, 7/F, Building P, Hung Hom, PQ704, Kowloon,Hong Kong, China
Dacheng Tao
Department of Computer Science, University of Bath, BA2 7AY, United Kingdom
Liang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Donate, A., Liu, X. (2010). Three Dimensional Information Extraction and Applications to Video Analysis. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds) Video Search and Mining. Studies in Computational Intelligence, vol 287. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12900-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-12900-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12899-8
Online ISBN: 978-3-642-12900-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics