Shot Boundary Detection with Spatial-Temporal Convolutional Neural Networks

  • Lifang Wu
  • Shuai Zhang
  • Meng JianEmail author
  • Zhijia Zhao
  • Dong Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11257)


Nowadays, digital videos have been widely leveraged to record and share various events and people’s daily life. It becomes urgent to provide automatic video semantic analysis and management for convenience. Shot boundary detection (SBD) plays a key fundamental role in various video analysis. Shot boundary detection aims to automatically detecting boundary frames of shots in videos. In this paper, we propose a progressive method for shot boundary detecting with histogram based shot filtering and C3D based gradual shot detection. Abrupt shots were detected firstly for its specialty and help alleviate locating shots across different shots by dividing the whole video into segments. Then, over the segments, gradual shot detection is implemented via a three-dimensional convolutional neural network model, which assign video clips into shot types of normal, dissolve, foi or swipe. Finally, for untrimmed videos, a frame level merging strategy is constructed to help locate the boundary of shots from neighboring frames. The experimental results demonstrate that the proposed method can effectively detect shots and locate their boundaries.


Shot boundary detection Shot transition Video indexing Convolutional neural networks Spatial-temporal feature 



This work was supported in part by Beijing Municipal Education Commission Science and Technology Innovation Project under Grant KZ201610005012, in part by Beijing excellent young talent cultivation project under Grant 2017000020124G075 and in part by China Postdoctoral Science Foundation funded project under Grant 2017M610027, 2018T110019.


  1. 1.
    Wang, J., Li, J., Gray, R.: Unsupervised multiresolution segmentation for images with low depth of field. IEEE Trans. Pattern Anal. Mach. Intell. 2(5), 99–110 (2002)Google Scholar
  2. 2.
    Zhang, H., Kankanhalli, A., Smoliar, S.: Automatic partitioning of full-motion video. Multimed. Syst. 1(1), 10–28 (1993)CrossRefGoogle Scholar
  3. 3.
    Lef\(\grave{e}\)vre, S., Vincent, N.: Efficient and robust shot change detection. J. R.-Time Image Process. 2(1), 23–34 (2007)Google Scholar
  4. 4.
    Zabih, R., Miller, J., Mai, K.: Feature-based algorithms for detecting and classifying scene breaks. Proc. ACM Multimed. 7(2), 189–200 (1995)Google Scholar
  5. 5.
    Sang, H., Kim, R.: Robust video indexing for video sequences with complex brightness variations (2002)Google Scholar
  6. 6.
    Wei, J., Ngan, K.: High accuracy flashlight scene determination for shot boundary detection. Signal Process. Image Commun. 18(3), 203–219 (2003)CrossRefGoogle Scholar
  7. 7.
    Feng, H., Yuan, H., Wei, M.: A shot boundary detection method based on color space. In: Proceedings of the International Conference on E-Business and E-Government, pp. 1647–1650. IEEE (2010)Google Scholar
  8. 8.
    Ueda, H., Miyatake, T., Yoshizawa, S.: IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system. Proc. Chi 7(7), 343–350 (1991)Google Scholar
  9. 9.
    Nagasaka, A., Tanaka, Y.: Automatic video indexing and full-video search for object appearances. Ipsj J. 33, 113–127 (1992)Google Scholar
  10. 10.
    Cheng, C., Lam, K., Zheng, T.: TRECVID2005 Experiments in The Hong Kong Polytechnic University: Shot Boundary Detection Based on a Multi-Step Comparison Scheme. TREC Video Retrieval Evaluation Notebook Papers (2005)Google Scholar
  11. 11.
    Cernekova, Z., Pitas, I., Nikou, C.: Information theory-based shot cut/fade detection and video summarization. IEEE Trans. Circuits Syst. Video Technol. 16(1), 82–91 (2005)CrossRefGoogle Scholar
  12. 12.
    Wang, J., Luo, W.: A self-adapting dual-threshold method for video shot transition detection. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control, pp. 704–707. IEEE (2008)Google Scholar
  13. 13.
    Zhang, H., Wu, J., Zhong, D.: An integrated system for content-based video retrieval and browsing. Pattern Recognit. 30(4), 643–658 (1997)CrossRefGoogle Scholar
  14. 14.
    Xu, J., Song, L., Xie, R.: Shot boundary detection using convolutional neural networks, In: Proceedings of Visual Communications and Image Processing, pp. 1–4. IEEE (2016)Google Scholar
  15. 15.
    Gygli, M.: Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks (2017)Google Scholar
  16. 16.
    Hassanien, A., Elgharib, M., Selim, A.: Large-scale, fast and accurate shot boundary detection through spatio-temporal convolutional neural networks (2017)Google Scholar
  17. 17.
    Pal, G., Rudrapaul, D., Acharjee, S.: Video shot boundary detection: a review. In: Satapathy, S., Govardhan, A., Raju, K., Mandal, J. (eds.) Emerging ICT for Bridging the Future - Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2. Advances in Intelligent Systems and Computing, vol. 338, pp. 119–127. Springer, Heidelberg (2015). Scholar
  18. 18.
    Du, T., Bourdev, L., Fergus, R.: Learning spatiotemporal features with 3D convolutional networks, pp. 4489–4497 (2014)Google Scholar
  19. 19.
    Smeaton, A., Over, P., Doherty, A.: Video shot boundary detection: seven years of TRECVid activity. Comput. Vis. Image Underst. 114(4), 411–418 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Lifang Wu
    • 1
  • Shuai Zhang
    • 1
  • Meng Jian
    • 1
    Email author
  • Zhijia Zhao
    • 1
  • Dong Wang
    • 1
  1. 1.Faculty of Information TechnologyBeijing University of TechnologyBeijingChina

Personalised recommendations