Skip to main content
Log in

Hyper-Siamese network for robust visual tracking

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Matching-based tracking has drawn increasingly interest in the object tracking field, among which SiamFC tracker shows great potentials in achieving high accuracy and efficiency. However, the feature representations of target in SiamFC are extracted by the last layer of convolutional neural networks and mainly capture semantic information, which makes SiamFC drift easily in presence of similar distractors. Considering that the different layers of convolutional neural networks characterize the target from different perspectives and the lower-level feature maps of SiamFC are computed beforehand, in this paper we design a skip-layer connection network named Hyper-Siamese to aggregate the hierarchical feature maps of SiamFC and constitute the hyper-feature representations of the target. Hyper-Siamese network is trained end-to-end offline on the ILSVRC2015 dataset and later utilized for online tracking. By visualizing the outputs of different layers and comparing the tracking results under various concatenation mode of layers, we prove that different convolutional layers are all useful for object tracking. Experimental results on the OTB100 and TC128 benchmarks demonstrate that our proposed algorithm performs favorably against not only the foundation tracker SiamFC (2.9% gain in OS rate and 2.8% gain in DP rate on OTB100) but also many state-of-the-art trackers. Meanwhile, our proposed tracker can achieve a real-time tracking speed (25 fps).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(99), 1–45 (2006)

    Google Scholar 

  2. Ali, A., Jalil, A., Ahmed, J.: Correlation, Kalman filter and adaptive fast mean shift based heuristic approach for robust visual tracking. Signal Image and Video Processing 9(7), 1567–1585 (2015)

    Article  Google Scholar 

  3. Chen, X., Zhang, M., Kai, R.: Improved mean shift target tracking based on self-organizing maps. SIViP 8(1), 103–112 (2014)

    Article  Google Scholar 

  4. Smeulders, A., Chu, D., Cucchiara, R., Calderara, S.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)

    Article  Google Scholar 

  5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: IEEE Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)

  6. Wang, L., Wang, L., Lu, H.: Saliency detection with recurrent fully convolutional networks. In: European Conference on Computer Vision, pp. 825–841 (2016)

  7. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: International Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  8. Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (2015)

  9. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1–10 (2016)

    Google Scholar 

  10. Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)

  11. Danelljan, M., Hager, G., Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 621–629 (2015)

  12. Wang, N., Li, S., Gupta, A., Yeung, D.Y.: Transferring rich feature hierarchies for robust visual tracking. arXiv CoRR (2015)

  13. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. arXiv CoRR (2015)

  14. Bertinetto, L., Valmadre, J., Henriques, J.F.: Fully-convolutional Siamese networks for object tracking. In: IEEE International Conference on Computer Vision, pp. 3119–3127 (2015)

  15. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)

  16. Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: International Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)

  17. Kong, T., Yao, A., Chen, Y.: HyperNet: towards accurate region proposal generation and joint object detection. In: International Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016)

  18. Bell, S., Zitnick, C.L., Bala, K.: Inside–outside net: detecting objects in context with skip pooling and recurrent neural networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 2874–2883 (2015)

  19. Hariharan, B., Arbelaez, P., Girshick, R.: Hypercolumns for object segmentation and fine-grained localization. In: International Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)

  20. Kim, K.H., Hong, S., Roh, B.: PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. arXiv:1501.04587 (2016)

  21. Wu, Y., Lim, J.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

    Article  Google Scholar 

  22. Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process 24(12), 5630–5644 (2015)

    Article  MathSciNet  Google Scholar 

  23. Held, D., Thrun, S., Savarese S.: Learning to track at 100 FPS with deep regression networks. In: European Conference on Computer Vision, pp. 749–765 (2015)

  24. Tao, R., Gavves, E., Smeulders, A.W.M. : Siamese Instance Search for Tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)

  25. Russakovsky, O., Deng, J., Su, H.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  26. Kuai, Y., Wen, G., Li, D.: When correlation filters meet fully-convolutional siamese networks for distractor-aware tracking. Signal Process. Image Commun. 64(1), 107–117 (2018)

    Article  Google Scholar 

  27. Kuai, Y., Wen, G., Li, D. Hyper-feature based tracking with the fully-convolutional Siamese network. In: International Conference on Digital Image Computing: Techniques and Applications, pp. 1–7 (2017)

  28. Wang, L., Ouyang, W., Wang, X.: Visual tracking with fully convolutional networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 3119–3127 (2015)

  29. Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)

  30. Wang, Q., Gao, J., Xing, J., Zhang, M., Hu, W.: DCFNet: discriminant correlation filters network for visual tracking. In: arXiv preprint arXiv:1704.04057 (2017)

  31. Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)

  32. Danelljan, M., Hager, G., Khan, F.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference, pp. 1–11 (2014)

  33. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)

    Article  Google Scholar 

  34. Ma, C., Yang, X., Zhang, C.: Long-term correlation tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 5388–5396 (2015)

  35. Li, Y., Zhu, J.: A Scale Adaptive Kernel Correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265 (2014)

  36. Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision, pp. 188–203 (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangliu Kuai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuai, Y., Wen, G. & Li, D. Hyper-Siamese network for robust visual tracking. SIViP 13, 35–42 (2019). https://doi.org/10.1007/s11760-018-1325-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-018-1325-6

Keywords

Navigation