Skip to main content

Feature Pooling Using Spatio-Temporal Constrain for Video Summarization and Retrieval

  • Conference paper
  • First Online:
Advanced Multimedia and Ubiquitous Engineering

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 393))

  • 1064 Accesses

Abstract

A content-based video retrieval via visual feature pooling is proposed in this paper. Since these visual words represent local features extracted from frame images, spatio-temporal constrains are applied to solve the ambiguity of the model towards effective retrieval of semantic video clips. Both shot level and segment level processing are employed, and the latter is found more robust in dealing with complex scenes where accurate video segmentation may fail. Our experimental results have shown that the constrained scheme help to improve 5 % average matching accuracy. In addition, it suggests that summarized videos at 25–30 % of original size can still maintain a viewing quality of 70–80 % towards fast content delivery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cotsaces C, Nikolaidis N, Pitas I (2006) Video shot detection and condensed representation: a review. IEEE Signal Proc Mag 23(2):28–37

    Article  Google Scholar 

  2. Ren J, Jiang J, Chen J (2009) Shot boundary detection in MPEG videos using local and global indicators. IEEE Trans Circ Syst Video Tech 19(8):1234–1238

    Google Scholar 

  3. Ren J, Jiang J (2009) Hierarchical modelling and adaptive clustering for real-time summarization of rush videos. IEEE Trans Multimedia 11(5):906–917

    Article  Google Scholar 

  4. Yuan Y, Wang H, Xiao et al (2007) A formal study of shot boundary detection. IEEE Trans Circ Syst Video Tech 17(2):168–186

    Google Scholar 

  5. Ngo CW, Ma YF, Zhang H-J (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circ Syst Video Tech 15(2):296–305

    Article  Google Scholar 

  6. Chang S-F, Vetro A (2005) Video adaptation: concepts, technologies, and open issues. Proc IEEE 93(1):148–158

    Article  Google Scholar 

  7. Hanjalic A, Xu L-Q (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154

    Article  Google Scholar 

  8. Qin J, Yung HC (2010) Scene categorization via contextual visual words. Pattern recognition

    Google Scholar 

  9. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. In: Int J Comput Vis (IJCV)

    Google Scholar 

  10. Tuytelaars T, Lampert CH, Blaschko MB, Buntine W (2010) Unsupervised object discovery: a comparison. IJCV

    Google Scholar 

  11. van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans, PAMI

    Google Scholar 

  12. Rapantzikos, K., Tsapatsoulis, N., Avrithis, Y., and Kollias, S. 2010, Spatiotemporal Saliency for Video Classification, Signal Processing: Image Communication

    Google Scholar 

  13. Zhang J et al (2007) Local feature and kernels for classification of textures and object categories: a comprehensive study. IJCV 73(2):213–238

    Article  Google Scholar 

  14. Spyrou E, Tolias G, Mylonas Ph, Avrithis Y (2009) Concept detection and keyframe extraction using a visual thesaurus. Multimed Tools Appl 41(3):337–373

    Google Scholar 

  15. Tuytelaars T, Mikolajczyk K (2008) Local invariant feature detectors: a survey. Found Tr Comput Gr Vis 3(3):177–280

    Article  Google Scholar 

  16. Jiang J, Qiu K, Xiao G (2008) A block-edge-pattern-based content descriptor in DCT domain. IEEE Trans Circ Syst Video Tech 18(7):994–998

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Ren .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Ren, J., Ren, J. (2016). Feature Pooling Using Spatio-Temporal Constrain for Video Summarization and Retrieval. In: Park, J., Jin, H., Jeong, YS., Khan, M. (eds) Advanced Multimedia and Ubiquitous Engineering. Lecture Notes in Electrical Engineering, vol 393. Springer, Singapore. https://doi.org/10.1007/978-981-10-1536-6_50

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-1536-6_50

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-1535-9

  • Online ISBN: 978-981-10-1536-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics