Hierarchical convolutional features for end-to-end representation-based visual tracking

Zhu, Suguo; Fang, Zhenying; Gao, Fei

doi:10.1007/s00138-018-0947-6

Hierarchical convolutional features for end-to-end representation-based visual tracking

Special Issue Paper
Published: 14 June 2018

Volume 29, pages 955–963, (2018)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Suguo Zhu¹,
Zhenying Fang¹ &
Fei Gao^1,2

356 Accesses
4 Citations
Explore all metrics

Abstract

Recently, deep learning is widely developed in computer vision applications. In this paper, a novel simple tracker with deep learning is proposed to complete the tracking task. A simple fully convolutional Siamese network is applied to capture the similarity between different frames. Nevertheless, the detailed information from lower layers, which is also important for locating the target object, is not considered into the tracking task. In this paper, the detailed information from two lower layers is considered into the response map to improve the performance and not to increase much time spent. This leads more significant improvement for feature representation and localization of the target object. The experimental results demonstrate that the proposed algorithm is efficient and robust compared with the baseline and the state-of-the-art trackers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust and Real-Time Visual Tracking Based on Single-Layer Convolutional Features and Accurate Scale Estimation

Robust Visual Tracking Using Oriented Gradient Convolution Networks

Feature selection accelerated convolutional neural networks for visual tracking

Article 30 March 2021

Zhiyan Cui & Na Lu

References

Wu, Y., Lim, J., Yang, M.H.: online object tracking: a benchmark. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2411–2418 (2013)
Yu, J., Hong, C., Rui, Y., Tao, D.: Multi-task autoencoder model for recovering human poses. IEEE Trans. Ind. Electron. (2017)
Yu, J., Rui, Y., Tao, D.: Click prediction for web image reranking using multimodal sparse coding. IEEE Trans. Image Process. 23(5), 2019 (2014)
Article MathSciNet MATH Google Scholar
Yu, J., Zhang, B., Kuang, Z., Lin, D., Fan, J.: iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans. Inf. Forensics Secur. 12(5), 1005 (2017)
Article Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: Efficient convolution operators for tracking. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 21–26 (2017)
Li, K., Kong, Y., Fu, Y.: Multi-stream deep similarity learning networks for visual tracking. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2166–2172. AAAI Press, Palo Alto (2017)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer, Berlin (2016)
Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 5000–5008 (2017)
Zhong, W., Lu, H., Yang, M.H.: Robust object tracking via sparsity-based collaborative model. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1838–1845 (2012)
Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096 (2016)
Article Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409 (2012)
Article Google Scholar
Sevilla-Lara, L., Learned-Miller, E.: Distribution fields for tracking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1910–1917 (2012)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: European Conference on Computer Vision, pp. 702–715. Springer, Berlin (2012)
Google Scholar
Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1822–1829 (2012)
Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via multi-task sparse learning. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 2042–2049 (2012)
Oron, S., Bar-Hillel, A., Levi, D., Avidan, S.: Locally orderless tracking. Int. J. Comput. Vis. 111(2), 213 (2015)
Article MathSciNet Google Scholar
Baldi, P., Chauvin, Y.: Neural networks for fingerprint recognition. Neural Comput. 5(3), 402 (1993)
Article Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1994)
Chen, K., Tao, W.: Once for all: a two-ow convolutional neural network for visual tracking. In: IEEE Transactions on Circuits and Systems for Video Technology (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1420–1429 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks. In: European Conference on Computer Vision, pp. 749–765. Springer, Berlin (2016)
Google Scholar
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3119–3127 (2015)
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)
Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust 11 tracker using accelerated proximal gradient approach. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1830–1837 (2012)
Dinh, T.B., Vo, N., Medioni, G.: Context tracker: Exploring supporters and distracters in unconstrained environments. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1177–1184 (2011)
Wu, Y., Shen, B., Ling, H.: Online robust image alignment via iterative convex optimization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 1808–1814 (2012)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61472110, 61772161,61532006, 61320106006 and 61601158, by the Zhejiang Provincial Science Foundation under Grants LQ16F030004.

Author information

Authors and Affiliations

Key Laboratory of Complex Systems Modeling and Simulation, School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
Suguo Zhu, Zhenying Fang & Fei Gao
State Key Laboratory of Integrated Services Networks, Xidian University, Xian, 710071, China
Fei Gao

Authors

Suguo Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenying Fang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suguo Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, S., Fang, Z. & Gao, F. Hierarchical convolutional features for end-to-end representation-based visual tracking. Machine Vision and Applications 29, 955–963 (2018). https://doi.org/10.1007/s00138-018-0947-6

Download citation

Received: 31 October 2017
Revised: 02 February 2018
Accepted: 11 March 2018
Published: 14 June 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s00138-018-0947-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical convolutional features for end-to-end representation-based visual tracking

Abstract

Access this article

Similar content being viewed by others

Robust and Real-Time Visual Tracking Based on Single-Layer Convolutional Features and Accurate Scale Estimation

Robust Visual Tracking Using Oriented Gradient Convolution Networks

Feature selection accelerated convolutional neural networks for visual tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical convolutional features for end-to-end representation-based visual tracking

Abstract

Access this article

Similar content being viewed by others

Robust and Real-Time Visual Tracking Based on Single-Layer Convolutional Features and Accurate Scale Estimation

Robust Visual Tracking Using Oriented Gradient Convolution Networks

Feature selection accelerated convolutional neural networks for visual tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation