Abstract
In contrast to the one-dimensional structure of natural language, images consist of two- or three-dimensional structures. This contrast in dimensionality causes the mapping between words and images to be a challenging, poorly understood and undertheorized task. In this paper, we present a general theoretical framework for semantic visual abstraction in massive image databases. Our framework applies specifically to facial identification and visual search for such recognition. It accommodates the by now commonplace observation that, through a graph-based visual abstraction, language allows humans to categorize objects and to provide verbal annotations to shapes. Our theoretical framework assumes a hidden layer between facial features and the referencing of expressive words. This hidden layer contains key points of correspondence that can be articulated mathematically, visually or verbally. A semantic visual abstraction network is designed for efficient facial recognition in massive visual datasets. In this paper, we demonstrate how a two-way mapping of words and facial shapes is feasible in facial information retrieval and reconstruction.
Chapter PDF
Similar content being viewed by others
Keywords
References
Cai, Y.: How Many Pixels Do We Need to See Things? In: Ganter, B., de Moor, A., Lex, W. (eds.) ICCS 2003. LNCS, vol. 2746. Springer, Heidelberg (2003)
Arnheim, R.: Visual Thinking. University of California Press (1969)
Allport, A.: Visual Attention. MIT Press, Cambridge (1993)
Yarbus, A.L.: Eye Movements during Perception of Complex Objects. Plenum Press, New York (1967)
Larkin, J.H., Simon, H.A.: Why a diagram is (sometimes) worth 10,000 words. Cognitive Science 11, 65–100 (1987)
Geisler, W.S., Perry, J.S.: Real-time foveated multiresolution system for low-bandwidth video communication. In: Proceedings of Human Vision and Electronic Imaging. SPIE, Bellingham (1998)
Shell, J.S., Selker, T., Vertegaal, R.: Interacting with groups of computers. Communications of the ACM 46, 40–46 (2003)
Tabachneck-Schijf, H.J.M., Leonardo, A.M., Simon, H.A.: CaMeRa: A computational model of multiple representations. Cognitive Science 21, 305–350 (1997)
Solso, R.L.: Cognition and the Visual Arts. The MIT Press, Cambridge (1993)
Roy, D.: Learning from Sights and Sounds: A Computational Model. Ph.D. In: Media Arts and Sciences, MIT (1999)
Doctorow, E.L.: Loon Lake. Random House, New York (1980)
Isherwood, C.: Goodbye to Berlin. Signet. (1952)
Updike, J.: The Rabbit is Rich. Ballantine Books (1996)
FBI Facial Identification Catalog (November 1988)
Spline (2007), http://en.wikipedia.org/wiki/Spline_mathematics
Li, Q., Rosa, M.D., Daniela, R.: Distributed Algorithms for Guiding Navigation across a Sensor Network. Dartmouth Department of Computer Science (2003)
Wolfe, J.M.: Visual Search. In: Pashler, H. (ed.) Attention, East Sussex. Psychology Press, UK (1998)
Theeuwes, J.: Perceptual selectivity for color and form. Perception & Psychophysics 51, 599–606 (1992)
Treisman, A., Gelade, G.: A feature integration theory of attention. Cognitive Psychology 12, 97–136 (1980)
Verghese, P.: Visual search and attention: A signal detection theory approach. Neuron 31(13), 523–535 (2001)
Visual Seach (2008), http://en.wikipedia.org/wiki/Visual_search
Yarbus, A.L.: Eye Movements during Perception of Complex Objects. Plenum Press, New York (1967)
Larkin, J.H., Simon, H.A.: Why a diagram is (sometimes) worth 10,000 words. Cognitive Science 11, 65–100 (1987)
Duchowski, A.T., et al.: Gaze-Contingent Displays: A Review. Cyber-Psychology and Behavior 7(6) (2004)
Kortum, P., Geisler, W.: Implementation of a foveated image coding system for image bandwidth reduction. In: SPIE Proceedings, vol. 2657, pp. 350–360 (1996)
Geisler, W.S., Perry, J.S.: Real-time foveated multiresolution system for low-bandwidth video communication. In: Proceedings of Human Vision and Electronic Imaging. SPIE, Bellingham (1998)
Majaranta, P., Raiha, K.J.: Twenty years of eye typing: systems and design issues. In: Eye Tracking Research and Applications (ETRA) Symposium. ACM Press, New Orleans (2002)
Marr, D.: Vision. W.H. Freeman, New York (1982)
Ballard, D.H., Brown, C.M.: Computer Vision. Prentice-Hall Inc., New Jersey (1982)
Mental Rotation (2007), http://en.wikipedia.org/wiki/Mental_rotation
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cai, Y., Kaufer, D., Hart, E., Solomon, E. (2009). Semantic Visual Abstraction for Face Recognition. In: Allen, G., Nabrzyski, J., Seidel, E., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds) Computational Science – ICCS 2009. Lecture Notes in Computer Science, vol 5544. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01970-8_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-01970-8_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01969-2
Online ISBN: 978-3-642-01970-8
eBook Packages: Computer ScienceComputer Science (R0)