Instance-level Object Recognition Using Deep Temporal Coherence

Lagunes-Fortiz, Miguel; Damen, Dima; Mayol-Cuevas, Walterio

doi:10.1007/978-3-030-03801-4_25

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11241))

Included in the following conference series:

International Symposium on Visual Computing

1737 Accesses

Abstract

In this paper we design and evaluate methods for exploiting temporal coherence present in video data for the task of instance object recognition. First, we evaluate the performance and generalisation capabilities of a Convolutional Neural Network for learning individual objects from multiple viewpoints coming from a video sequence. Then, we exploit the assumption that on video data the same object remains present over a number of consecutive frames. A-priori knowing such number of consecutive frames is a difficult task however, specially for mobile agents interacting with objects in front of them. Thus, we evaluate the use of temporal filters such as Cumulative Moving Average and a machine learning approach using Recurrent Neural Networks for this task. We also show that by exploiting temporal coherence, models trained with a few data points perform comparably to when the whole dataset is available.

This work was funded in part by the Mexican scientific agency Consejo Nacional de Ciencia y Tecnologia (CONACyT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014
Article Google Scholar
Canziani, A., Culurciello, E.: Cortexnet: a generic network family for robust visual temporal representations. CoRR abs/1706.02735 (2017). http://arxiv.org/abs/1706.02735
Cheng, G., Zhou, P., Han, J.: Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans. Geosci. Remote. Sens. 54(12), 7405–7415 (2016). https://doi.org/10.1109/TGRS.2016.2601622
Article Google Scholar
Clark, R., Wang, S., Markham, A., Trigoni, N., Wen, H.: Vidloc: 6-dof video-clip relocalization. CoRR abs/1702.06521 (2017). http://arxiv.org/abs/1702.06521
Damen, D., Bunnun, P., Calway, A., Mayol-Cuevas, W.: Real-time learning and detection of 3D texture-less objects: a scalable approach. In: British Machine Vision Conference. BMVA, September 2012. http://www.cs.bris.ac.uk/Publications/Papers/2001575.pdf
Donahue, J., et al.: Decaf: A deep convolutional activation feature for generic visual recognition. CoRR abs/1310.1531 (2013). http://arxiv.org/abs/1310.1531
Fawzi, A., Moosavi-Dezfooli, S., Frossard, P.: Robustness of classifiers: from adversarial to random noise. CoRR abs/1608.08967 (2016). http://arxiv.org/abs/1608.08967
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005). https://doi.org/10.1007/11550907_126
Chapter Google Scholar
Graves, A., Jaitly, N., Mohamed, A.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, December 8–12, 2013, pp. 273–278 (2013). https://doi.org/10.1109/ASRU.2013.6707742
Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3D residual networks for action recognition. CoRR abs/1708.07632 (2017). http://arxiv.org/abs/1708.07632
Hodaň, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., Zabulis, X.: T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)
Google Scholar
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Li, Z., Hoiem, D.: Learning without forgetting. CoRR abs/1606.09282 (2016). http://arxiv.org/abs/1606.09282
Lomonaco, V., Maltoni, D.: Core50: a new dataset and benchmark for continuous object recognition. arXiv preprint arXiv:1705.03550 (2017)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article MathSciNet Google Scholar
Osherov, E., Lindenbaum, M.: Increasing cnn robustness to occlusions by reducing filter support. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 550–561, October 2017. https://doi.org/10.1109/ICCV.2017.67
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR abs/1602.07261 (2016). http://arxiv.org/abs/1602.07261
Tripathi, S., Lipton, Z.C., Belongie, S.J., Nguyen, T.Q.: Context matters: refining object detection in video with recurrent neural networks. CoRR abs/1607.04648 (2016). http://arxiv.org/abs/1607.04648
Zamir, A.R., Wu, T., Sun, L., Shen, W., Malik, J., Savarese, S.: Feedback networks. CoRR abs/1612.09508 (2016). http://arxiv.org/abs/1612.09508

Download references

Author information

Authors and Affiliations

Bristol Robotics Laboratory, University of Bristol, Bristol, UK
Miguel Lagunes-Fortiz & Walterio Mayol-Cuevas
Computer Science Department, University of Bristol, Bristol, UK
Dima Damen & Walterio Mayol-Cuevas

Authors

Miguel Lagunes-Fortiz
View author publications
You can also search for this author in PubMed Google Scholar
Dima Damen
View author publications
You can also search for this author in PubMed Google Scholar
Walterio Mayol-Cuevas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel Lagunes-Fortiz .

Editor information

Editors and Affiliations

University of Nevada, Reno, USA
George Bebis
NASA Ames Research Center, Moffett Field, USA
Richard Boyle
University of Nevada, Reno, USA
Bahram Parvin
Desert Research Institute, Reno, USA
Darko Koracin
DARPA, Arlington, USA
Matt Turek
University of Utah, Salt Lake City, USA
Srikumar Ramalingam
National University of Defense Technology, Changsha, China
Kai Xu
Microsoft Research Asia, Beijing, China
Stephen Lin
Bosch Research, Farmington Hills, MI, USA
Bilal Alsallakh
University of North Carolina at Charlotte, Charlotte, USA
Jing Yang
Microsoft Research, Redmond, USA
Eduardo Cuervo
University of Colorado at Colorado Springs, Colorado Springs, USA
Jonathan Ventura

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lagunes-Fortiz, M., Damen, D., Mayol-Cuevas, W. (2018). Instance-level Object Recognition Using Deep Temporal Coherence. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2018. Lecture Notes in Computer Science(), vol 11241. Springer, Cham. https://doi.org/10.1007/978-3-030-03801-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-03801-4_25
Published: 10 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03800-7
Online ISBN: 978-3-030-03801-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics