Soccer Video Event Detection Using 3D Convolutional Networks and Shot Boundary Detection via Deep Feature Distance

  • Tingxi Liu
  • Yao LuEmail author
  • Xiaoyu Lei
  • Lijing Zhang
  • Haoyu Wang
  • Wei Huang
  • Zijian Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)


In this work, we propose a novel framework combining temporal action localization and play-break (PB) rules for soccer video event detection. Firstly we treat event detection task in action-level, and adopt 3D convolutional networks to perform action localization. Then we employ PB rules to organize actions into events using long view and replay logo detected in the first step. Finally, we determine the semantic classes of events according to principal actions which contain key semantic information of highlights. For long untrimmed videos, we propose a shot boundary detection method using deep feature distance (DFD) to reduce the number of proposals and improve the performance of localization. Experiment results verify the effectiveness of our framework on a new dataset which contains 152 classes of semantic actions and scenes in soccer video.


Soccer event detection Temporal action localization 3D convolutional networks Deep feature distance 



This work was supported by the National Natural Science Foundation of China (No. 61273273).


  1. 1.
    Zhao, W., Lu, Y., Jiang, H., Huang, W.: Event detection in soccer videos using shot focus identification. In: 3rd IAPR Asian Conference on Pattern Recognition (ACPR) 2015, pp. 341–345. IEEE (2015)Google Scholar
  2. 2.
    Jiang, H., Lu, Y., Xue, J.: Automatic soccer video event detection based on a deep neural network combined cnn and rnn. In: IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) 2016, pp. 490–494. IEEE (2016)Google Scholar
  3. 3.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)Google Scholar
  4. 4.
    Shou, Z., Wang, D., Chang, S.F.: Temporal action localization in untrimmed videos via multi-stage cnns. In: CVPR (2016)Google Scholar
  5. 5.
    Escorcia, V., Caba Heilbron, F., Niebles, J.C., Ghanem, B.: DAPs: deep action proposals for action understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016). doi: 10.1007/978-3-319-46487-9_47 CrossRefGoogle Scholar
  6. 6.
    Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. arXiv preprint arXiv:1703.01515 (2017)
  7. 7.
    Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: Sst: Single-stream temporal action proposals. In: CVPR (2017)Google Scholar
  8. 8.
    By, H.A.: Shot-boundary detection: unraveled and resolved. IEEE Trans. Circ. Syst. Video Technol. 12(2), 90–105 (2010)Google Scholar
  9. 9.
    Tsamoura, E., Mezaris, V., Kompatsiaris, I.: Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework. In: IEEE International Conference on Image Processing, pp. 45–48 (2008)Google Scholar
  10. 10.
    Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Computer Vision and Pattern Recognition, pp. 3169–3176 (2011)Google Scholar
  11. 11.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)Google Scholar
  12. 12.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)Google Scholar
  13. 13.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  14. 14.
    Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.V.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36 (2016)Google Scholar
  15. 15.
    Jiang, Y.G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: THUMOS challenge: Action recognition with a large number of classes (2014).
  16. 16.
    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. Computer Science (2012)Google Scholar
  17. 17.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding, pp. 675–678 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Tingxi Liu
    • 1
  • Yao Lu
    • 1
    Email author
  • Xiaoyu Lei
    • 1
  • Lijing Zhang
    • 1
  • Haoyu Wang
    • 1
  • Wei Huang
    • 1
  • Zijian Wang
    • 1
    • 2
  1. 1.Beijing Laboratory of Intelligent Information TechnologySchool of Computer Science, Beijing Institute of TechnologyBeijingChina
  2. 2.China Central TelevisionBeijingChina

Personalised recommendations