2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

Masala, Giovanni; Casu, Filippo; Golosio, Bruno; Grosso, Enrico

doi:10.1007/s00521-017-3235-x

2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

S.I. : EANN 2016
Published: 13 October 2017

Volume 29, pages 329–341, (2018)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Giovanni Masala¹,
Filippo Casu²,
Bruno Golosio² &
…
Enrico Grosso²

494 Accesses
3 Citations
Explore all metrics

Abstract

This paper proposes a novel method for robust visual tracking of arbitrary objects, based on the combination of image-based prediction and position refinement by weighted correlation. The effectiveness of the proposed approach is demonstrated on a challenging set of dynamic video sequences, extracted from the final of triple jump at the London 2012 Summer Olympics. A comparison is made against five baseline tracking systems. The novel system shows remarkable superior performances with respect to the other methods, in all considered cases characterized by changing background, and a large variety of articulated motions. The novel architecture, from here onward named 2D Recurrent Neural Network (2D-RNN), is derived from the well-known recurrent neural network model and adopts nearest neighborhood connections between the input and context layers in order to store the temporal information content of the video. Starting from the selection of the object of interest in the first frame, neural computation is applied to predict the position of the target in each video frame. Normalized cross-correlation is then applied to refine the predicted target position. 2D-RNN ensures limited complexity, great adaptability and a very fast learning time. At the same time, it shows on the considered dataset fast execution times and very good accuracy, making this approach an excellent candidate for automated analysis of complex video streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning for Generic Object Detection: A Survey

Article Open access 31 October 2019

Li Liu, Wanli Ouyang, … Matti Pietikäinen

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

Mingqi Gao, Feng Zheng, … Jungong Han

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Denman H, Rea N, Kokaram A (2003) Content-based analysis for video from snooker broadcasts. Comput Vis Image Underst 92(2):176–195
Article MATH Google Scholar
Kokaram A, Pitie F, Dahyot R, Rea N, Yeterian S. Content controlled image representation for sports streaming. In: Proceedings of content-based multimedia indexing (CBMI05)
Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv (CSUR) 38(4):13
Article Google Scholar
Hong S, You T, Kwak S, Han B. Online tracking by learning discriminative saliency map with convolutional neural network. ArXiv preprint arXiv:1502.06796
Bao C, Wu Y, Ling H, Ji H (2012) Real time robust L1 tracker using accelerated proximal gradient approach. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 1830–1837
Jia X, Lu H, Yang M-H (2012) Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 1822–1829
Mei X, Ling H (2009) Robust visual tracking using L1 minimization. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 1436–1443
Babenko B, Yang M-H, Belongie S (2011) Robust object tracking with online multiple instance learning. IEEE Trans Pattern Anal Mach Intell 33(8):1619–1632
Article Google Scholar
Hare S, Saffari A, Torr PH (2011) Struck: structured output tracking with kernels. In: 2011 IEEE international conference on computer vision (ICCV), IEEE, pp 263–270
Grabner H, Grabner M, Bischof H (2006) Real-time tracking via on-line boosting. In: BMVC, vol 1, p 6
Gall J, Yao A, Razavi N, Van Gool L, Lempitsky V (2011) Hough forests for object detection, tracking, and action recognition. IEEE Trans Pattern Anal Mach Intell 33(11):2188–2202
Article Google Scholar
Schulter S, Leistner C, Roth PM, Bischof H, Van Gool LJ (2011) On-line hough forests. In: BMVC, pp 1–11
Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the European conference on computer vision
Danelljan M, Khan FS, Felsberg M, van de Weijer J (2014) Adaptive color attributes for real-time visual tracking. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), June 2014, pp 1090–1097
Wang X, Ma L, Wang B, Wang T (2013) A hybrid optimization-based recurrent neural network for real-time data prediction. Neurocomputing 120:547–559
Article Google Scholar
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211
Article Google Scholar
Ondruska P, Posner I (2016) Deep tracking: seeing beyond seeing using recurrent neural networks. In: AAAI-16 conference, 12–17 Feb, Phoenix, Arizona USA
Korekado K, Morie T, Nomura O, Ando H, Nakano T, Matsugu M, Iwata A (2003) A convolutional neural network vlsi for image recognition using merged/mixed analog-digital architecture. In: Knowledge-based intelligent information and engineering systems, pp 169–176. Springer
Sermanet P et al (2013) OverFeat: integrated recognition, localization and detection using convolutional networks. In: International conference on learning representations (ICLR 2014), 16, CBLS
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition, ECCV
Shaoqing R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS)
Redmon J et al (2015) You only look once: unified, real-time object detection. arXiv:1506.02640
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016). Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp 850–865. Springer
Masala GL, Golosio B, Tistarelli M, Grosso E (2016) 2D recurrent neural networks for robust visual tracking of non-rigid bodies. In: International conference on engineering applications of neural networks, pp 18–34. Springer
Briechle K, Hanebeck UD (2001) Template matching using fast normalized cross correlation. In: Aerospace/defense sensing, simulation, and controls. International Society for Optics and Photonics, pp 95–102
Bradski GR (1998) Real time face and object tracking as a component of a perceptual user interface. In: Proceedings of the fourth IEEE workshop on applications of computer vision (WACV ‘98), pp 214, 219, 19–21 Oct 1998
Şeker S, Ayaz E, Türkcan E (2003) Elman’s recurrent neural network applications to condition monitoring in nuclear power plant and rotating machinery. Eng Appl Artif Intell 16(7):647–656
Google Scholar
Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River
MATH Google Scholar
Dataset: final of triple jump at the London 2012 Summer Olympics available on the YouTube platform. https://www.youtube.com/watch?v=GeYfshPYyZ8
Liu R, Wang D, Han Y, Fan X, Luo Z (2017) Adaptive low-rank subspace learning with online optimization for robust visual tracking. Neural networks, vol 88, April 2017, pp 90–104, ISSN 0893-6080. doi:10.1016/j.neunet.2017.02.002
Smeulders AW, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Article Google Scholar
Stewart R, Andriluka M (2016) End-to-end people detection in crowded scenes. In: 29th IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Los Alamitos, CA

Download references

Author information

Authors and Affiliations

School of Computing, Electronics and Mathematics, Plymouth University, Portland Square, Drake Circus, Plymouth, PL4 8AA, UK
Giovanni Masala
Department of Political Science, Communication, Engineering and Information Technologies, Computer Vision Laboratory, University of Sassari, Viale Mancini, 5, 07100, Sassari, Italy
Filippo Casu, Bruno Golosio & Enrico Grosso

Authors

Giovanni Masala
View author publications
You can also search for this author in PubMed Google Scholar
Filippo Casu
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Golosio
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Grosso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Masala.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masala, G., Casu, F., Golosio, B. et al. 2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes. Neural Comput & Applic 29, 329–341 (2018). https://doi.org/10.1007/s00521-017-3235-x

Download citation

Received: 20 December 2016
Accepted: 04 October 2017
Published: 13 October 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s00521-017-3235-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

Abstract

Access this article

Similar content being viewed by others

Deep Learning for Generic Object Detection: A Survey

Deep learning for video object segmentation: a review

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

Abstract

Access this article

Similar content being viewed by others

Deep Learning for Generic Object Detection: A Survey

Deep learning for video object segmentation: a review

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation