Abstract
Visual tracking is an important task within the field of computer vision. Recently, deep neural networks have gained significant attention thanks to their success on learning image features. But the existing deep neural networks applied in visual tracking are full-connected complicated architectures with large amount of redundant parameters that would be low efficiently to learn. We tackle this problem by using a novel convolutional deep belief network (CDBN) with convolution, weights sharing and pooling to have much fewer parameters to learn, in addition to gain translation invariance which would benefit the tracker performance. Theoretical analysis and experimental evaluations on an open tracker benchmark demonstrate our CDBN based tracker is more accurate by improving tracking success rate 22.6 % and tracking precision 62.8 % on average, while maintaining low computation cost by reduces the number of parameters to 44.4 %, compared to DLT, another well-known deep learning tracker. Meanwhile, our tracker can achieve real-time performance by a graphics processing unit (GPU) speedup of 2.61 times on average and up to 3.08 times.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Adam, A., Rivlin, E., Shimshoni, I.: Robust fragments-based tracking using the integral histogram. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: IEEE International Conference on Computer Vision (ICCV) (2011)
Yang, H., Shao, L., Zheng, F., Wang, L., Song, Z.: Recent advances and trends in visual tracking: a review. Neurocomputing 74(18), 3823–3831 (2011)
Smeulders, A., Chu, D., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Annual Conference on Neural Information Processing Systems (NIPS) (2012)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: A Deep Convolutional Activation Feature for Generic Visual Recognition (2013). arXiv preprint arXiv:1310.1531
Hinton, G.E., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Sainath, T.N., Kingsbury, B., Saon, G., Soltau, H., Mohamed, A.R., Dahl, G., Ramabhadran, B.: Deep convolutional neural networks for large-scale speech tasks. Neural Netw. 64, 39–48 (2015)
Socher, R., Liu, C., Ng, A.: Parsing natural scenes and natural language with recursive neural networks. In: International Conference on Machine Learning (ICML) (2011)
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: International Conference on Machine Learning (ICML) (2009)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: Annual Conference on Neural Information Processing Systems (NIPS) (2013)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Wu, Y., Lim, J., Yang, M.: Online object tracking: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Torralba, A., Fergus, R., Freeman, W.T.: 80 Million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Agrawal, P., Girshick, R., Malik, J.: Analyzing the performance of multilayer neural networks for object recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 329–344. Springer, Heidelberg (2014)
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series. In: Arbib, M.A. (ed.) Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge (1995)
Huang, G.B., Lee, H., Erik, L.M.: Learning hierarchical representations for face verification with convolutional deep belief networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Doucet, A., Freitas, D.N., Gordon, N.: Sequential Monte Carlo Methods in Practice. Springer, New York (2001)
Arulampalam, M., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Sig. Process. 50(2), 174–188 (2002)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)
Babenko, B., Yang, M., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011)
Ross, D., Lim, J., Lin, R., Yang, M.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1), 125–141 (2008)
Zhang, K., Zhang, L., Yang, M.-H.: Real-time compressive tracking. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 864–877. Springer, Heidelberg (2012)
Kwon, J., Lee, K.: Visual tracking decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Brosch, T., Tam, R.: Efficient training of convolutional deep belief networks in the frequency domain for application to high-resolution 2D and 3D images. Neural Comput. 27(1), 211–227 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hu, D., Zhou, X., Wu, J. (2015). Visual Tracking Based on Convolutional Deep Belief Network. In: Chen, Y., Ienne, P., Ji, Q. (eds) Advanced Parallel Processing Technologies. APPT 2015. Lecture Notes in Computer Science(), vol 9231. Springer, Cham. https://doi.org/10.1007/978-3-319-23216-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-23216-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23215-7
Online ISBN: 978-3-319-23216-4
eBook Packages: Computer ScienceComputer Science (R0)