Combining multiple views and temporal associations for 3-D object recognition

Massad, Amin; Mertsching, Bärbel; Schmalz, Steffen

doi:10.1007/BFb0054774

Combining multiple views and temporal associations for 3-D object recognition

Amin Massad¹,
Bärbel Mertsching¹ &
Steffen Schmalz¹

Conference paper
First Online: 01 January 2006

193 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1407))

Abstract

This article describes an architecture for the recognition of three-dimensional objects on the basis of viewer centred representations and temporal associations. Considering evidence from psychophysics, neurophysiology, as well as computer science we have decided to use a viewer centred approach for the representation of three-dimensional objects. Even though this concept quite naturally suggests utilizing the temporal order of the views for learning and recognition, this aspect is often neglected. Therefore we will pay special attention to the evaluation of the temporal information and embed it into the conceptual framework of biological findings and computational advantages. The proposed recognition system consists of four stages and includes different kinds of artificial neural networks: Preprocessing is done by a Gabor-based wavelet transform. A Dynamic Link Matching algorithm, extended by several modifications, forms the second stage. It implements recognition and learning of the view classes. The temporal order of the views is recorded by a STORE network which transforms the output for a presented sequence of views into an item- and-order coding. A subsequent Gaussian-ARTMAP architecture is used for the classification of the sequences and for their mapping onto object classes by means of supervised learning. The results achieved with this system show its capability to autonomously learn and to recognize considerably similar objects. Furthermore the given examples illustrate the benefits for object recognition stemming from the utilization of the temporal context. Ambiguous views become manageable and a higher degree of robustness against misclassifications can be accomplished.

Download to read the full chapter text

Chapter PDF

References

Biederman, I. (1985): Human image understanding: Recent research and a theory. Comput. Vision Graphics Image Processing 32: 29–73.
Article Google Scholar
Bradski, G.; Carpenter, G. and Grossberg, S. (1992): Working memory networks for learning temporal order with application to three-dimensional visual object recognition. Neural Comput. 4: 270–286.
Google Scholar
Bradski, G. and Grossberg, S. (1993): Fast learning VIEWNET architectures for recognizing 3-D objects from multiple 2-D views. Technical Report CAS/CNS-TR-93-053, Boston Univ., Boston, MA.
Google Scholar
Carpenter, G.; Grossberg, S.; Markuzon, N.; Reynolds, J. and Rosen, D. (1992): Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Networks 3(5): 698–713.
Article Google Scholar
Darrell, T. and Pentland, A. (1993): Recognition of space-time gestures using a distributed representation. In R. Mammone (ed.), Artificial neural networks for speech and vision. London: Chapman & Hall, pp. 502–519.
Google Scholar
Edelman, S. and Weinshall, D. (1991): A self-organizing multiple-view representation of 3D objects. Biol. Cybern. 64: 209–219.
Article Google Scholar
Grossberg, S. and Bradski, G. (1995): VIEWNET architectures for invariant 3-D object recognition from multiple views. In B. Bouchon-meunier; R. Yager and L. Zadeh (eds.), Fuzzy logic and soft computing. Singapore: World Scientific Publishing.
Google Scholar
Koenderink, J. and van Doorn, A. (1979): The internal representation of solid shape with respect to vision. Biol. Cybern. 32: 211–216.
Article MATH Google Scholar
Konen, W.; Maurer, T. and von der Malsburg, C. (1994): A fast link matching algorithm for invariant pattern recognition. Neural Networks 7: 1019–1030.
Article MATH Google Scholar
Lowe, D. G. (1986): Perceptual organization and visual recognition. Boston: Kluwer.
Google Scholar
Marr, D. and Nishihara, H. K. (1978): Representation and recognition of the spatial organization of three-dimensional shapes. Proc. R. Soc. Lond. 200: 269–294.
Article Google Scholar
Matsakis, Y.; Berthoz, A.; Lipschits, M. and Gurfinkel, V. (1990): Mental rotation of three-dimensional shapes in microgravity. In Proc. Fourth European Symposium on Life Sciences in Space. ESA, pp. 625–629.
Google Scholar
Metzler, J. and Shepard, R. (1974): Transformational studies of the internal representaion of three-dimensional objects. In R. Solso (ed.), Theories in cognitive psychology: The Loyola Symposium. Erlbaum.
Google Scholar
Miyashita, Y. (1988): Neuronal correlate of visual associative long-term memory in the primate temporal cortex. Nature 335: 817–820.
Article Google Scholar
Perrett, D.; Harries, M.; Benson, P.; Chitty, A. and Mistlin, A. (1990): Retrieval of structure from rigid and biological motion: An analysis of the visual responses of neurones in the Macaque temporal cortex. In A. Blake and T. Troscianko (eds.), AI and the Eye, chap. 8. Wiley & Sons, pp. 181–200.
Google Scholar
Perrett, D.; Oram, M.; Harries, M.; Bevan, R.; Hietanen, J.; Benson, P. and Thomas, S. (1991): Viewer-centred and object-centred coding of heads in the macaque tempral cortex. Experimental Brain Research 86: 159–173.
Article Google Scholar
Poggio, T. and Edelman, S. (1990): A network that learns to recognize three-dimensional objects. Nature 343: 263–266.
Article Google Scholar
Sakai, K. and Miyashita, Y. (1991): Neural organization for the long-term memory of paired associates. Nature 354: 152–155.
Article Google Scholar
Sarle, W. (1995): Why statisticians should not FART. ftp://ftp.sas.com/pub/ neural/fart.doc.
Google Scholar
Seibert, M. and Waxman, A. (1992): Adaptive 3-D object recognition from multiple views. IEEE Trans. Pattern Anal. and Machine Intel. 14(2): 107–124.
Article Google Scholar
Stryker, M. P. (1991): Temporal associations. Nature 354: 108–109.
Article Google Scholar
Sumi, S. (1984): Upside-down presentation of the Johansson moving light-spot pattern. Perception 13: 283–286.
Google Scholar
Tanaka, K. (1996): Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19: 109–139.
Article Google Scholar
Thompson, D. W. and Mundy, J. L. (1987): Three dimensional model matching from an unconstrained viewpoint. In Proc. IEEE Int. Conf. Robotics and Automation. Raleigh, NC: IEEE, pp. 208–220.
Google Scholar
Ullman, S. and Basri, R. (1991): Recognition by linear combinations of models. IEEE Trans. Pattern Anal. and Machine Intel. 13(10): 992–1006.
Article Google Scholar
von der Malsburg, C. (1981): The correlation theory of brain function. Internal report, Max-Planck-Institut für Biophysikalische Chemie, Göttingen.
Google Scholar
von der Malsburg, C. and Reiser, K. (1995): Pose invariant object recognition in a neural system. In Proc. Int. Conf. on Artificial Neural Networks ICANN. pp. 127–132.
Google Scholar
Wallis, G. (in press): Temporal order in human object recognition learning. To appear in Journal of Biological Systems.
Google Scholar
Williamson, J. (1996): Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimensional maps. Neural Networks 9(5): 881–897.
Article Google Scholar
Wiskott, L. and von der Malsburg, C. (1996): Face recognition by Dynamic Link Matching. Internal Report IR-INI 96-05, Inst. f. Neuroinformatik, Ruhr-Univ. Bochum.
Google Scholar
Würtz, R. (1994): Multilayer dynamic link networks for establishing image point correspondences and visual object recognition. Ph.D. thesis, Ruhr-Univ. Bochum.
Google Scholar

Download references

Author information

Authors and Affiliations

Dep. of Computer Science, AG IMA, University of Hamburg, Vogt-Kölln-Str. 30, D-22527, Hamburg, Germany
Amin Massad, Bärbel Mertsching & Steffen Schmalz

Authors

Amin Massad
View author publications
You can also search for this author in PubMed Google Scholar
Bärbel Mertsching
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Schmalz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hans Burkhardt Bernd Neumann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Massad, A., Mertsching, B., Schmalz, S. (1998). Combining multiple views and temporal associations for 3-D object recognition. In: Burkhardt, H., Neumann, B. (eds) Computer Vision — ECCV’98. ECCV 1998. Lecture Notes in Computer Science, vol 1407. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054774

Download citation

DOI: https://doi.org/10.1007/BFb0054774
Published: 26 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64613-6
Online ISBN: 978-3-540-69235-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics