Joint Representation and Truncated Inference Learning for Correlation Filter Based Tracking

  • Yingjie Yao
  • Xiaohe Wu
  • Lei Zhang
  • Shiguang Shan
  • Wangmeng ZuoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


Correlation filter (CF) based trackers generally include two modules, i.e., feature representation and on-line model adaptation. In existing off-line deep learning models for CF trackers, the model adaptation usually is either abandoned or has closed-form solution to make it feasible to learn deep representation in an end-to-end manner. However, such solutions fail to exploit the advances in CF models, and cannot achieve competitive accuracy in comparison with the state-of-the-art CF trackers. In this paper, we investigate the joint learning of deep representation and model adaptation, where an updater network is introduced for better tracking on future frame by taking current frame representation, tracking result, and last CF tracker as input. By modeling the representor as convolutional neural network (CNN), we truncate the alternating direction method of multipliers (ADMM) and interpret it as a deep network of updater, resulting in our model for learning representation and truncated inference (RTINet). Experiments demonstrate that our RTINet tracker achieves favorable tracking accuracy against the state-of-the-art trackers and its rapid version can run at a real-time speed of 24 fps. The code and pre-trained models will be publicly available at


Visual tracking Correlation filters Convolutional neural networks Unrolled optimization 



This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61671182 and 61471146.

Supplementary material

474192_1_En_34_MOESM1_ESM.pdf (5.9 mb)
Supplementary material 1 (pdf 6021 KB)


  1. 1.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). Scholar
  2. 2.
    Bibi, A., Mueller, M., Ghanem, B.: Target response adaptation for correlation filter tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 419–433. Springer, Cham (2016). Scholar
  3. 3.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)Google Scholar
  4. 4.
    Chen, K., Tao, W.: Once for all: a two-flow convolutional neural network for visual tracking. TCSVT PP, 1 (2017)Google Scholar
  5. 5.
    Choi, J., Kwon, J., Lee, K.M.: Deep meta learning for real-time visual tracking based on target-specific feature space. arXiv:1712.09153 (2017)
  6. 6.
    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR, pp. 21–26 (2017)Google Scholar
  7. 7.
    Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: ICCV Workshop, pp. 58–66 (2015)Google Scholar
  8. 8.
    Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV, pp. 4310–4318 (2015)Google Scholar
  9. 9.
    Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: CVPR, pp. 1430–1438 (2016)Google Scholar
  10. 10.
    Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). Scholar
  11. 11.
    Danelljan, M., Shahbaz Khan, F., Felsberg, M., Van de Weijer, J.: Adaptive color attributes for real-time visual tracking. In: CVPR, pp. 1090–1097 (2014)Google Scholar
  12. 12.
    Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). Scholar
  13. 13.
    Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic siamese network for visual object tracking. In: ICCV, pp. 1–9 (2017)Google Scholar
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  15. 15.
    Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). Scholar
  16. 16.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. TPAMI 37(3), 583–596 (2015)CrossRefGoogle Scholar
  17. 17.
    Kiani Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: CVPR, pp. 1135–1143 (2017)Google Scholar
  18. 18.
    Kristan, M., et al.: The visual object tracking VOT2016 challenge results. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 777–823. Springer, Cham (2016). Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  20. 20.
    Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H.: Learning spatial-temporal regularized correlation filters for visual tracking. In: CVPR (2018)Google Scholar
  21. 21.
    Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. TIP 24(12), 5630–5644 (2015)MathSciNetGoogle Scholar
  22. 22.
    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV, pp. 3074–3082 (2015)Google Scholar
  23. 23.
    Mueller, M., Smith, N., Ghanem, B.: Context-aware correlation filter tracking. In: CVPR, pp. 1396–1404 (2017)Google Scholar
  24. 24.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2015)Google Scholar
  25. 25.
    Park, E., Berg, A.C.: Meta-tracker: fast and robust online adaptation for visual object trackers. arXiv:1801.03049 (2018)CrossRefGoogle Scholar
  26. 26.
    Qi, Y., et al.: Hedged deep tracking. In: CVPR, pp. 4303–4311 (2016)Google Scholar
  27. 27.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  28. 28.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
  29. 29.
    Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)Google Scholar
  30. 30.
    Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-End representation learning for correlation filter based tracking. In: CVPR, pp. 5000–5008 (2017)Google Scholar
  31. 31.
    Wu, Y., Lim, J., Yang, M.H.: Object Tracking Benchmark. TPAMI 37(9), 1834–1848 (2015)CrossRefGoogle Scholar
  32. 32.
    Yang, Y., Sun, J., Li, H., Xu, Z.: Deep ADMM-Net for compressive sensing MRI. In: NIPS, pp. 10–18 (2016)Google Scholar
  33. 33.
    Zuo, W., Ren, D., Gu, S., Lin, L., Zhang, L., et al.: Discriminative learning of iteration-wise priors for blind deconvolution. In: CVPR, pp. 3232–3240 (2015)Google Scholar
  34. 34.
    Zuo, W., Wu, X., Lin, L., Zhang, L., Yang, M.H.: Learning support correlation filters for visual tracking. TPAMI (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Harbin Institute of TechnologyHarbinChina
  2. 2.University of PittsburghPittsburghUSA
  3. 3.Institute of Computing Technology, CASBeijingChina

Personalised recommendations