Abstract
Matching-based tracking has drawn increasingly interest in the object tracking field, among which SiamFC tracker shows great potentials in achieving high accuracy and efficiency. However, the feature representations of target in SiamFC are extracted by the last layer of convolutional neural networks and mainly capture semantic information, which makes SiamFC drift easily in presence of similar distractors. Considering that the different layers of convolutional neural networks characterize the target from different perspectives and the lower-level feature maps of SiamFC are computed beforehand, in this paper we design a skip-layer connection network named Hyper-Siamese to aggregate the hierarchical feature maps of SiamFC and constitute the hyper-feature representations of the target. Hyper-Siamese network is trained end-to-end offline on the ILSVRC2015 dataset and later utilized for online tracking. By visualizing the outputs of different layers and comparing the tracking results under various concatenation mode of layers, we prove that different convolutional layers are all useful for object tracking. Experimental results on the OTB100 and TC128 benchmarks demonstrate that our proposed algorithm performs favorably against not only the foundation tracker SiamFC (2.9% gain in OS rate and 2.8% gain in DP rate on OTB100) but also many state-of-the-art trackers. Meanwhile, our proposed tracker can achieve a real-time tracking speed (25 fps).
Similar content being viewed by others
References
Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(99), 1–45 (2006)
Ali, A., Jalil, A., Ahmed, J.: Correlation, Kalman filter and adaptive fast mean shift based heuristic approach for robust visual tracking. Signal Image and Video Processing 9(7), 1567–1585 (2015)
Chen, X., Zhang, M., Kai, R.: Improved mean shift target tracking based on self-organizing maps. SIViP 8(1), 103–112 (2014)
Smeulders, A., Chu, D., Cucchiara, R., Calderara, S.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: IEEE Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Wang, L., Wang, L., Lu, H.: Saliency detection with recurrent fully convolutional networks. In: European Conference on Computer Vision, pp. 825–841 (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: International Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1–10 (2016)
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision, pp. 3074–3082 (2015)
Danelljan, M., Hager, G., Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: IEEE International Conference on Computer Vision Workshop, pp. 621–629 (2015)
Wang, N., Li, S., Gupta, A., Yeung, D.Y.: Transferring rich feature hierarchies for robust visual tracking. arXiv CoRR (2015)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. arXiv CoRR (2015)
Bertinetto, L., Valmadre, J., Henriques, J.F.: Fully-convolutional Siamese networks for object tracking. In: IEEE International Conference on Computer Vision, pp. 3119–3127 (2015)
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: International Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
Kong, T., Yao, A., Chen, Y.: HyperNet: towards accurate region proposal generation and joint object detection. In: International Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016)
Bell, S., Zitnick, C.L., Bala, K.: Inside–outside net: detecting objects in context with skip pooling and recurrent neural networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 2874–2883 (2015)
Hariharan, B., Arbelaez, P., Girshick, R.: Hypercolumns for object segmentation and fine-grained localization. In: International Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)
Kim, K.H., Hong, S., Roh, B.: PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. arXiv:1501.04587 (2016)
Wu, Y., Lim, J.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process 24(12), 5630–5644 (2015)
Held, D., Thrun, S., Savarese S.: Learning to track at 100 FPS with deep regression networks. In: European Conference on Computer Vision, pp. 749–765 (2015)
Tao, R., Gavves, E., Smeulders, A.W.M. : Siamese Instance Search for Tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)
Russakovsky, O., Deng, J., Su, H.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Kuai, Y., Wen, G., Li, D.: When correlation filters meet fully-convolutional siamese networks for distractor-aware tracking. Signal Process. Image Commun. 64(1), 107–117 (2018)
Kuai, Y., Wen, G., Li, D. Hyper-feature based tracking with the fully-convolutional Siamese network. In: International Conference on Digital Image Computing: Techniques and Applications, pp. 1–7 (2017)
Wang, L., Ouyang, W., Wang, X.: Visual tracking with fully convolutional networks. In: International Conference on Computer Vision and Pattern Recognition, pp. 3119–3127 (2015)
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)
Wang, Q., Gao, J., Xing, J., Zhang, M., Hu, W.: DCFNet: discriminant correlation filters network for visual tracking. In: arXiv preprint arXiv:1704.04057 (2017)
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)
Danelljan, M., Hager, G., Khan, F.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference, pp. 1–11 (2014)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Ma, C., Yang, X., Zhang, C.: Long-term correlation tracking. In: International Conference on Computer Vision and Pattern Recognition, pp. 5388–5396 (2015)
Li, Y., Zhu, J.: A Scale Adaptive Kernel Correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265 (2014)
Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: European Conference on Computer Vision, pp. 188–203 (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kuai, Y., Wen, G. & Li, D. Hyper-Siamese network for robust visual tracking. SIViP 13, 35–42 (2019). https://doi.org/10.1007/s11760-018-1325-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-018-1325-6