Advertisement

Signal, Image and Video Processing

, Volume 13, Issue 1, pp 35–42 | Cite as

Hyper-Siamese network for robust visual tracking

  • Yangliu KuaiEmail author
  • Gongjian Wen
  • Dongdong Li
Original Paper
  • 137 Downloads

Abstract

Matching-based tracking has drawn increasingly interest in the object tracking field, among which SiamFC tracker shows great potentials in achieving high accuracy and efficiency. However, the feature representations of target in SiamFC are extracted by the last layer of convolutional neural networks and mainly capture semantic information, which makes SiamFC drift easily in presence of similar distractors. Considering that the different layers of convolutional neural networks characterize the target from different perspectives and the lower-level feature maps of SiamFC are computed beforehand, in this paper we design a skip-layer connection network named Hyper-Siamese to aggregate the hierarchical feature maps of SiamFC and constitute the hyper-feature representations of the target. Hyper-Siamese network is trained end-to-end offline on the ILSVRC2015 dataset and later utilized for online tracking. By visualizing the outputs of different layers and comparing the tracking results under various concatenation mode of layers, we prove that different convolutional layers are all useful for object tracking. Experimental results on the OTB100 and TC128 benchmarks demonstrate that our proposed algorithm performs favorably against not only the foundation tracker SiamFC (2.9% gain in OS rate and 2.8% gain in DP rate on OTB100) but also many state-of-the-art trackers. Meanwhile, our proposed tracker can achieve a real-time tracking speed (25 fps).

Keywords

SiamFC Similar distractor Semantics Hyper-Siamese 

References

  1. 1.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(99), 1–45 (2006)Google Scholar
  2. 2.
    Ali, A., Jalil, A., Ahmed, J.: Correlation, Kalman filter and adaptive fast mean shift based heuristic approach for robust visual tracking. Signal Image and Video Processing 9(7), 1567–1585 (2015)CrossRefGoogle Scholar
  3. 3.
    Chen, X., Zhang, M., Kai, R.: Improved mean shift target tracking based on self-organizing maps. SIViP 8(1), 103–112 (2014)CrossRefGoogle Scholar
  4. 4.
    Smeulders, A., Chu, D., Cucchiara, R., Calderara, S.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)CrossRefGoogle Scholar
  5. 5.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: IEEE Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  6. 6.
    Wang, L., Wang, L., Lu, H.: Saliency detection with recurrent fully convolutional networks. In: European Conference on Computer Vision, pp. 825–841 (2016)Google Scholar
  7. 7.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: International Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  8. 8.
    Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (2015)Google Scholar
  9. 9.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1–10 (2016)Google Scholar
  10. 10.
    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)Google Scholar
  11. 11.
    Danelljan, M., Hager, G., Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 621–629 (2015)Google Scholar
  12. 12.
    Wang, N., Li, S., Gupta, A., Yeung, D.Y.: Transferring rich feature hierarchies for robust visual tracking. arXiv CoRR (2015)Google Scholar
  13. 13.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. arXiv CoRR (2015)Google Scholar
  14. 14.
    Bertinetto, L., Valmadre, J., Henriques, J.F.: Fully-convolutional Siamese networks for object tracking. In: IEEE International Conference on Computer Vision, pp. 3119–3127 (2015)Google Scholar
  15. 15.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)Google Scholar
  16. 16.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: International Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)Google Scholar
  17. 17.
    Kong, T., Yao, A., Chen, Y.: HyperNet: towards accurate region proposal generation and joint object detection. In: International Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016)Google Scholar
  18. 18.
    Bell, S., Zitnick, C.L., Bala, K.: Inside–outside net: detecting objects in context with skip pooling and recurrent neural networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 2874–2883 (2015)Google Scholar
  19. 19.
    Hariharan, B., Arbelaez, P., Girshick, R.: Hypercolumns for object segmentation and fine-grained localization. In: International Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)Google Scholar
  20. 20.
    Kim, K.H., Hong, S., Roh, B.: PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. arXiv:1501.04587 (2016)
  21. 21.
    Wu, Y., Lim, J.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)CrossRefGoogle Scholar
  22. 22.
    Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process 24(12), 5630–5644 (2015)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Held, D., Thrun, S., Savarese S.: Learning to track at 100 FPS with deep regression networks. In: European Conference on Computer Vision, pp. 749–765 (2015)Google Scholar
  24. 24.
    Tao, R., Gavves, E., Smeulders, A.W.M. : Siamese Instance Search for Tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)Google Scholar
  25. 25.
    Russakovsky, O., Deng, J., Su, H.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Kuai, Y., Wen, G., Li, D.: When correlation filters meet fully-convolutional siamese networks for distractor-aware tracking. Signal Process. Image Commun. 64(1), 107–117 (2018)CrossRefGoogle Scholar
  27. 27.
    Kuai, Y., Wen, G., Li, D. Hyper-feature based tracking with the fully-convolutional Siamese network. In: International Conference on Digital Image Computing: Techniques and Applications, pp. 1–7 (2017)Google Scholar
  28. 28.
    Wang, L., Ouyang, W., Wang, X.: Visual tracking with fully convolutional networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 3119–3127 (2015)Google Scholar
  29. 29.
    Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)Google Scholar
  30. 30.
    Wang, Q., Gao, J., Xing, J., Zhang, M., Hu, W.: DCFNet: discriminant correlation filters network for visual tracking. In: arXiv preprint arXiv:1704.04057 (2017)
  31. 31.
    Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)Google Scholar
  32. 32.
    Danelljan, M., Hager, G., Khan, F.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference, pp. 1–11 (2014)Google Scholar
  33. 33.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRefGoogle Scholar
  34. 34.
    Ma, C., Yang, X., Zhang, C.: Long-term correlation tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 5388–5396 (2015)Google Scholar
  35. 35.
    Li, Y., Zhu, J.: A Scale Adaptive Kernel Correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265 (2014)Google Scholar
  36. 36.
    Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision, pp. 188–203 (2013)Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.ATR Key LaboratoryNational University of Defense TechnologyChangshaChina

Personalised recommendations