Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks
- 159 Downloads
- 2 Citations
Abstract
Robust visual object tracking against occlusions and deformations is still very challenging task. To tackle these issues, existing Convolutional Neural Networks (CNNs) based trackers either fail to handle them or can just run in low speed. In this paper, we present a realtime tracker which is robust to occlusions and deformations based on a Region-based, Multi-Scale Fully Convolutional Siamese Network (R-MSFCN). In the proposed R-MSFCN, the information of regions is extracted separately by the proposition of position-sensitive score maps on multiple convolutional layers. Combining these score maps via adaptive weights leads to accurate location of the target on a new frame. The experiments illustrate that our method outperforms state-of-the-art approaches, and can handle the cases of object deformation and occlusion at about 31 FPS.
Keywords
Visual tracking Region-based Fully convolutional Siamese-network Deep learningNotes
Acknowledgments
This work was supported in part by Natural Science Foundation of China (No.61231018), National Science and Technology Support Program (2015BAH31F01) and Program of Introducing Talents of Discipline to University under grant B13043.
References
- 1.Ahuja N, Liu S, Ghanem B, Zhang T (2012) Robust visual tracking via multi-task sparse learning. In: CVPR, pp 2042–2049Google Scholar
- 2.Bertinetto L, Valmadre J, Golodetz S, Miksik O, Torr P (2016) Staple: complementary learners for real-time tracking. Comput Sci 38(2):311–323Google Scholar
- 3.Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr P (2016) Fully-convolutional siamese networks for object tracking. arXiv:1606.09549
- 4.Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. TPAMIGoogle Scholar
- 5.Danelljan M, Hager G, Khan FS, Felsberg M (2014) Accurate scale estimation for robust visual tracking. In: BMVCGoogle Scholar
- 6.Danelljan M, Hager G, Khan FS, Felsberg M (2016) Adaptive decontamination of the training set: a unified formulation for discriminative visual tracking. In: CVPRGoogle Scholar
- 7.Danelljan M, Robinson A, Khan FS, Felsberg M (2016) Beyond correlation filters: learning continuous convolution operators for visual tracking. In: ECCVGoogle Scholar
- 8.Hare S, Saffari A, Torr PHS (2016) Struck: structured output tracking with kernels. TPAMI 38(10):263–270CrossRefGoogle Scholar
- 9.Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: ECCVGoogle Scholar
- 10.Henriques JF, Rui C, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. TPAMI 37(3):583–596CrossRefGoogle Scholar
- 11.Jifeng D, Yi L, Kaiming H, Jian S (2016) R-FCN: object detection via region-based fully convolutional networks. arXiv:1605.06409
- 12.Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. TPAMI 34(7):1409–22CrossRefGoogle Scholar
- 13.Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2016) The visual object tracking vot2015 challenge results. In: ICCV, pp 564–586Google Scholar
- 14.Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2):2012Google Scholar
- 15.Li Y, Qi H, Dai J, Ji X, Wei Y (2016) Fully convolutional instance-aware semantic segmentation. arXiv preprint arXiv:1611.07709
- 16.Liu T, Wang G, Yang Q (2015) Real-time part-based visual tracking via adaptive correlation filters. In: CVPR, pp 4902–4912Google Scholar
- 17.Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR, pp 3431–3440Google Scholar
- 18.Ma C, Yang X, Zhang C, Yang MH (2015) Long-term correlation tracking. In: CVPR, pp 5388–5396Google Scholar
- 19.Nam H, Han B (2015) Learning multi-domain convolutional neural networks for visual tracking. arXiv preprint arXiv:1510.07945
- 20.Nam H, Baek M, Han B (2016) Modeling and propagating cnns in a tree structure for visual tracking. arXiv:1608.07242
- 21.Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: ICCV, pp 1520–1528Google Scholar
- 22.Pinheiro PO, Collobert R, Dollar P (2015) Learning to segment object candidates. Comput Sci: 1990–1998Google Scholar
- 23.Qi Y, Zhang S, Qin L, Yao H, Huang Q, Lim J, Yang MH (2016) Hedged deep tracking. In: CVPR, pp 4303–4311Google Scholar
- 24.Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRefGoogle Scholar
- 25.Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. IJCV 77(1):125–141CrossRefGoogle Scholar
- 26.Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. IJCV 115(3):211–252MathSciNetCrossRefGoogle Scholar
- 27.Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: CVPR, pp 1420–1429Google Scholar
- 28.Wang L, Ouyang W, Wang X, Lu H (2016) Visual tracking with fully convolutional networks. In: ICCV, pp 3119–3127Google Scholar
- 29.Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: CVPR, pp 2411–2418Google Scholar
- 30.Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. TPAMI 37 (9):1–1CrossRefGoogle Scholar
- 31.Xiang W, Zhou Y (2014) Part-based tracking with appearance learning and structural constrains. In: ICONIP. Springer, Berlin, pp 594–601Google Scholar
- 32.Yao R, Shi Q, Shen C, Zhang Y (2013) Part-based visual tracking with online latent structural learning. In: CVPR, pp 2363–2370Google Scholar
- 33.Zhang T, Jia K, Xu C, Ma Y, Ahuja N (2014) Partial occlusion handling for visual tracking via robust part matching. In: ICCV, pp 1258–1265Google Scholar
- 34.Zhao H, Shi J, Qi X, Wang X, Jia J (2016) Pyramid scene parsing network. arXiv:1612.01105