Robust Online Visual Tracking with a Single Convolutional Neural Network

Li, Hanxi; Li, Yi; Porikli, Fatih

doi:10.1007/978-3-319-16814-2_13

Hanxi Li^17,19,
Yi Li^17,18 &
Fatih Porikli^17,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Asian Conference on Computer Vision

2385 Accesses
23 Citations

Abstract

Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking because they require very long training time and a large number of training samples. In this work, we present an efficient and very robust online tracking algorithm using a single Convolutional Neural Network (CNN) for learning effective feature representations of the target object over time. Our contributions are multifold: First, we introduce a novel truncated structural loss function that maintains as many training samples as possible and reduces the risk of tracking error accumulation, thus drift, by accommodating the uncertainty of the model output. Second, we enhance the ordinary Stochastic Gradient Descent approach in CNN training with a temporal selection mechanism, which generates positive and negative samples within different time periods. Finally, we propose to update the CNN model in a “lazy” style to speed-up the training stage, where the network is updated only when a significant appearance change occurs on the object, without sacrificing tracking accuracy. The CNN tracker outperforms all compared state-of-the-art methods in our extensive evaluations that involve 18 well-known benchmark video sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Two parameters \(r_{\mu }\) and \(r_{\sigma }\) determine a local contrast normalization process. In this work, we use three configurations, i.e., \(\{r_{\mu } = 3, r_{\sigma } = 1\}\), \(\{r_{\mu } = 3, r_{\sigma } = 3\}\) and \(\{r_{\mu } = 5, r_{\sigma } = 5\}\), respectively.
2.
Here we follow the labeling style in conventional CNN training.
3.
In this paper \(o = 3\), i.e., the bounding box changes in its location and the scale.
4.
\(s = h / 32\), where \(h\) is object’s height.

References

Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-Based Probabilistic Tracking. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 661–675. Springer, Heidelberg (2002)
Chapter Google Scholar
Collins, R.T., Liu, Y., Leordeanu, M.: Online selection of discriminative tracking features. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1631–1643 (2005)
Article Google Scholar
Adam, A., Rivlin, E., Shimshoni, I.: Robust fragments-based tracking using the integral histogram. In: CVPR 2006, vol. 1 (2006)
Google Scholar
Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: ICCV 2011, pp. 263–270. IEEE (2011)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2037–2041 (2006)
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)
Article Google Scholar
Kavukcuoglu, K., Sermanet, P., Boureau, Y.L., Gregor, K., Mathieu, M., LeCun, Y.: Learning convolutional feature hierachies for visual recognition. In: NIPS 2010
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS 2012 (2012)
Google Scholar
Ciresan, D.C., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: CVPR 2012 (2012)
Google Scholar
Fan, J., Xu, W., Wu, Y., Gong, Y.: Human tracking using convolutional neural networks. Trans. Neural Netw. 21, 1610–1623 (2010)
Article Google Scholar
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: NIPS 2013 (2013)
Google Scholar
Babenko, B., Yang, M.H., Belongie, S.: Visual tracking with online multiple instance learning. Transactions on Pattern Analysis and Machine Intelligence (2011)
Google Scholar
Zheng, Y., Liu, Q., Chen, E., Ge, Y., Zhao, J.L.: Time series classification using multi-channels deep convolutional neural networks. In: Li, F., Li, G., Hwang, S., Yao, B., Zhang, Z. (eds.) WAIM 2014. LNCS, vol. 8485, pp. 298–310. Springer, Heidelberg (2014)
Chapter Google Scholar
Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)
Google Scholar
Cireşan, D., Meier, U., Masci, J., Schmidhuber, J.: Multi-column deep neural network for traffic sign classification. Neural Netw. 32, 333–338 (2012)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Intl J. Comput. Vis. 88, 303–338 (2010)
Article Google Scholar
Viola, P., Platt, J., Zhang, C., et al.: Multiple instance boosting for object detection. In: NIPS, vol. 2, p. 5 (2005)
Google Scholar
Xing, J., Gao, J., Li, B., Hu, W., Yan, S.: Robust object tracking with online multi-lifespan dictionary learning. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 665–672. IEEE (2013)
Google Scholar
Kalal, Z., Matas, J., Mikolajczyk, K.: Pn learning: bootstrapping binary classifiers by structural constraints. In: CVPR 2010, pp. 49–56. IEEE (2010)
Google Scholar
Dinh, T.B., Vo, N., Medioni, G.: Context tracker: exploring supporters and distracters in unconstrained environments. In: CVPR 2011, pp. 1177–1184. IEEE (2011)
Google Scholar
Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: CVPR 2012, pp. 1822–1829. IEEE (2012)
Google Scholar
Zhong, W., Lu, H., Yang, M.H.: Robust object tracking via sparsity-based collaborative model. In: CVPR 2012, pp. 1838–1845. IEEE (2012)
Google Scholar
Ross, D.A., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Intl. J. Comput. Vis. 77, 125–141 (2008)
Article Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR 2013 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Canberra Research Laboratory, NICTA, Sydney, Australia
Hanxi Li, Yi Li & Fatih Porikli
Research School of Engineering, Australian National University, Canberra, Australia
Yi Li & Fatih Porikli
School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
Hanxi Li

Authors

Hanxi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Fatih Porikli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanxi Li .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (zip 16,402 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Li, Y., Porikli, F. (2015). Robust Online Visual Tracking with a Single Convolutional Neural Network. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-16814-2_13
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics