Structured Siamese Network for Real-Time Visual Tracking

  • Yunhua Zhang
  • Lijun Wang
  • Jinqing QiEmail author
  • Dong Wang
  • Mengyang Feng
  • Huchuan Lu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


Local structures of target objects are essential for robust tracking. However, existing methods based on deep neural networks mostly describe the target appearance from the global view, leading to high sensitivity to non-rigid appearance change and partial occlusion. In this paper, we circumvent this issue by proposing a local structure learning method, which simultaneously considers the local patterns of the target and their structural relationships for more accurate target tracking. To this end, a local pattern detection module is designed to automatically identify discriminative regions of the target objects. The detection results are further refined by a message passing module, which enforces the structural context among local patterns to construct local structures. We show that the message passing module can be formulated as the inference process of a conditional random field (CRF) and implemented by differentiable operations, allowing the entire model to be trained in an end-to-end manner. By considering various combinations of the local structures, our tracker is able to form various types of structure patterns. Target tracking is finally achieved by a matching procedure of the structure patterns between target template and candidates. Extensive evaluations on three benchmark data sets demonstrate that the proposed tracking algorithm performs favorably against state-of-the-art methods while running at a highly efficient speed of 45 fps.


Tracking Deep learning Siamese network 



This work was supported by the Natural Science Foundation of China under Grant 61725202, 61751212, 61771088, 61632006 and 91538201.


  1. 1.
    Abadi, M., et al.: Tensorflow: large scale machine learning on heterogeneous distributed systems. In: arXiv preprint arXiv:1603.04467 (2016)
  2. 2.
    Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: complementary learners for real-time tracking. In: CVPR (2016)Google Scholar
  3. 3.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). Scholar
  4. 4.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)Google Scholar
  5. 5.
    Choi, J., Chang, H.J., Jeong, J., Demiris, Y., Jin, Y.C.: Visual tracking using attention-modulated disintegration and integration. In: CVPR (2016)Google Scholar
  6. 6.
    Choi, J., Chang, H.J., Yun, S., Fischer, T., Demiris, Y., Jin, Y.C.: Attentional correlation filter network for adaptive visual tracking. In: CVPR (2017)Google Scholar
  7. 7.
    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR (2017)Google Scholar
  8. 8.
    Danelljan, M., Hger, G., Khan, F.S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: BMVC (2014)Google Scholar
  9. 9.
    Danelljan, M., Hger, G., Khan, F.S., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: ICCV Workshop (2015)Google Scholar
  10. 10.
    Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: ECCV (2016)Google Scholar
  11. 11.
    Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In: ICCV (2017)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)Google Scholar
  13. 13.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. In: ICVS (2008)Google Scholar
  14. 14.
    Huang, C., Lucey, S., Ramanan, D.: Learning policies for adaptive tracking with deep feature cascades. In: ICCV (2017)Google Scholar
  15. 15.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  16. 16.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS, pp. 109–117 (2011)Google Scholar
  17. 17.
    Kristan, M., et al.: The visual object tracking VOT2016 challenge results. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 777–823. Springer, Cham (2016). Scholar
  18. 18.
    Li, P., Wang, D., Wang, L., Lu, H.: Deep visual tracking: review and experimental comparison. Pattern Recogn. 76, 323–338 (2018)CrossRefGoogle Scholar
  19. 19.
    Li, Y., Zhu, J., Hoi, S.C.H.: Reliable patch trackers: robust visual tracking by exploiting reliable patches. In: CVPR (2015)Google Scholar
  20. 20.
    Liu, T., Wang, G., Yang, Q.: Real-time part-based visual tracking via adaptive correlation filters. In: CVPR (2015)Google Scholar
  21. 21.
    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV (2015)Google Scholar
  22. 22.
    Ma, C., Yang, X., Zhang, C., Yang, M.H.: Long-term correlation tracking. In: CVPR (2015)Google Scholar
  23. 23.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)Google Scholar
  24. 24.
    Nam, H., Baek, M., Han, B.: Modeling and propagating CNNs in a tree structure for visual tracking. In: arXiv preprint arXiv:1608.07242 (2016)
  25. 25.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2014)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1442–1468 (2014)CrossRefGoogle Scholar
  27. 27.
    Son, J., Jung, I., Park, K., Han, B.: Tracking-by-segmentation with online gradient boosting decision tree. In: ICCV, pp. 3056–3064 (2016)Google Scholar
  28. 28.
    Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R., Yang, M.H.: Crest: convolutional residual learning for visual tracking. In: ICCV (2017)Google Scholar
  29. 29.
    Sun, C., Wang, D., Lu, H., Yang, M.H.: Correlation tracking via joint discrimination and reliability learning (2018)Google Scholar
  30. 30.
    Sun, C., Wang, D., Lu, H., Yang, M.H.: Learning spatial-aware regressions for visual tracking (2018)Google Scholar
  31. 31.
    Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: CVPR (2016)Google Scholar
  32. 32.
    Teng, Z., Xing, J., Wang, Q., Lang, C., Feng, S., Jin, Y.: Robust object tracking based on temporal and spatial deep networks. In: ICCV (2017)Google Scholar
  33. 33.
    Valmadre, J., Bertinetto, L., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: End-to-end representation learning for correlation filter based tracking. In: CVPR (2017)Google Scholar
  34. 34.
    Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: ICCV (2015)Google Scholar
  35. 35.
    Wang, L., Ouyang, W., Wang, X., Lu, H.: STCT: sequentially training convolutional networks for visual tracking. In: CVPR (2016)Google Scholar
  36. 36.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR (2013)Google Scholar
  37. 37.
    Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1834–1848 (2015)CrossRefGoogle Scholar
  38. 38.
    Yeo, D., Son, J., Han, B., Han, J.H.: Superpixel-based tracking-by-segmentation using markov chains. In: CVPR, pp. 511–520 (2017)Google Scholar
  39. 39.
    Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 188–203. Springer, Cham (2014). Scholar
  40. 40.
    Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: ICCV (2015)Google Scholar
  41. 41.
    Zhu, G., Porikli, F., Li, H.: Beyond local search: tracking objects everywhere with instance-specific proposals. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Information and Communication EngineeringDalian University of TechnologyDalianChina

Personalised recommendations