Learning Siamese Network with Top-Down Modulation for Visual Tracking

Yao, Yingjie; Wu, Xiaohe; Zuo, Wangmeng; Zhang, David

doi:10.1007/978-3-030-02698-1_33

Yingjie Yao¹⁷,
Xiaohe Wu¹⁷,
Wangmeng Zuo¹⁷ &
…
David Zhang^17,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11266))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

1793 Accesses
2 Citations

Abstract

The performance of visual object tracking depends largely on the target appearance model. Benefited from the success of CNN in feature extraction, recent studies have paid much attention to CNN representation learning and feature fusion model. However, the existing feature fusion models ignore the relation between the features of different layers. In this paper, we propose a deep feature fusion model based on the siamese network by considering the connection between feature maps of CNN. To tackle the limitation of different feature map sizes in CNN, we propose to fuse different resolution feature maps by introducing de-convolutional layers in the offline training stage. Specifically, a top-down modulation is adopted for feature fusion. In the tracking stage, a simple matching operation between the fused feature of the examplar and search region is conducted with the learned model, which can maintain the real-time tracking speed. Experimental results show that, the proposed method obtains favorable tracking accuracy against the state-of-the-art trackers with a real-time tracking speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: complementary learners for real-time tracking. In: CVPR, pp. 1401–1409 (2016)
Google Scholar
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV, pp. 4310–4318 (2015)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Chapter Google Scholar
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45
Chapter Google Scholar
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. TPAMI 37(3), 583–596 (2015)
Article Google Scholar
Kristan, M., et al.: The Visual Object Tracking VOT2016 Challenge Results, October 2016. http://www.springer.com/gp/book/9783319488806
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Google Scholar
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV, pp. 3074–3082 (2015)
Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2015)
Google Scholar
Qi, Y., et al.: Hedged deep tracking. In: CVPR, pp. 4303–4311 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: top-down modulation for object detection. arXiv:1612.06851 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)
Google Scholar
Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-End representation learning for correlation filter based tracking. In: CVPR, pp. 5000–5008 (2017)
Google Scholar
Vedaldi, A., Lenc, K.: MatConvNet: convolutional neural networks for MATLAB. In: ICM, pp. 689–692. ACM (2015)
Google Scholar
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: ICCV, pp. 3119–3127 (2015)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR, pp. 2411–2418 (2013)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. TPAMI 37(9), 1834–1848 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Vision Perception and Cognition, Harbin Institute of Technology, Harbin, China
Yingjie Yao, Xiaohe Wu, Wangmeng Zuo & David Zhang
The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
David Zhang

Authors

Yingjie Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohe Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wangmeng Zuo
View author publications
You can also search for this author in PubMed Google Scholar
David Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaohe Wu .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Yuxin Peng
Shanghai Jiao Tong University, Shanghai, China
Kai Yu
Tsinghua University, Beijing, China
Jiwen Lu
Central China Normal University, Wuhan, China
Xingpeng Jiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yao, Y., Wu, X., Zuo, W., Zhang, D. (2018). Learning Siamese Network with Top-Down Modulation for Visual Tracking. In: Peng, Y., Yu, K., Lu, J., Jiang, X. (eds) Intelligence Science and Big Data Engineering. IScIDE 2018. Lecture Notes in Computer Science(), vol 11266. Springer, Cham. https://doi.org/10.1007/978-3-030-02698-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-02698-1_33
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02697-4
Online ISBN: 978-3-030-02698-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics