Abstract
In this work, we propose a novel framework combining temporal action localization and play-break (PB) rules for soccer video event detection. Firstly we treat event detection task in action-level, and adopt 3D convolutional networks to perform action localization. Then we employ PB rules to organize actions into events using long view and replay logo detected in the first step. Finally, we determine the semantic classes of events according to principal actions which contain key semantic information of highlights. For long untrimmed videos, we propose a shot boundary detection method using deep feature distance (DFD) to reduce the number of proposals and improve the performance of localization. Experiment results verify the effectiveness of our framework on a new dataset which contains 152 classes of semantic actions and scenes in soccer video.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhao, W., Lu, Y., Jiang, H., Huang, W.: Event detection in soccer videos using shot focus identification. In: 3rd IAPR Asian Conference on Pattern Recognition (ACPR) 2015, pp. 341–345. IEEE (2015)
Jiang, H., Lu, Y., Xue, J.: Automatic soccer video event detection based on a deep neural network combined cnn and rnn. In: IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) 2016, pp. 490–494. IEEE (2016)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Shou, Z., Wang, D., Chang, S.F.: Temporal action localization in untrimmed videos via multi-stage cnns. In: CVPR (2016)
Escorcia, V., Caba Heilbron, F., Niebles, J.C., Ghanem, B.: DAPs: deep action proposals for action understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016). doi:10.1007/978-3-319-46487-9_47
Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. arXiv preprint arXiv:1703.01515 (2017)
Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: Sst: Single-stream temporal action proposals. In: CVPR (2017)
By, H.A.: Shot-boundary detection: unraveled and resolved. IEEE Trans. Circ. Syst. Video Technol. 12(2), 90–105 (2010)
Tsamoura, E., Mezaris, V., Kompatsiaris, I.: Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework. In: IEEE International Conference on Image Processing, pp. 45–48 (2008)
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Computer Vision and Pattern Recognition, pp. 3169–3176 (2011)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.V.: Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36 (2016)
Jiang, Y.G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: THUMOS challenge: Action recognition with a large number of classes (2014). http://crcv.ucf.edu/THUMOS14/
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. Computer Science (2012)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding, pp. 675–678 (2014)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61273273).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Liu, T. et al. (2017). Soccer Video Event Detection Using 3D Convolutional Networks and Shot Boundary Detection via Deep Feature Distance. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-70096-0_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70095-3
Online ISBN: 978-3-319-70096-0
eBook Packages: Computer ScienceComputer Science (R0)