Abstract
We present a biologically inspired model for learning prototypical representations of head poses. The model employs populations of integrate-and-fire neurons and operates in the temporal domain. Times-to-first spike (latencies) are used to develop a rank-order code, which is invariant to global contrast and brightness changes. Our model consists of 3 layers. In the first layer, populations of Gabor filters are used to extract feature maps from the input image. Filter activities are converted into spike latencies to determine their temporal spike order. In layer 2, intermediate level neurons respond selectively to feature combinations that are statistically significant in the presented image dataset. Synaptic connectivity between layer 1 and 2 is adapted by a mechanism of spike-timing dependent plasticity (STDP). This mechanism realises an unsupervised Hebbian learning scheme that modifies synaptic weights according to their timing between pre- and postsynaptic spike. The third layer employs a radial basis function (RBF) classifier to evaluate neural responses from layer 2. Our results show quantitatively that the network performs well in discriminating between 9 different input poses gathered from 200 subjects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Albrecht, D., Geisler, W., Frazor, R., Crane, A.: Visual cortex neurons of monkey and cats: Temporal dynamics of the contrast response fucntion. Neurophysiology 88, 888–913 (2002)
Beardsley, P.A.: A qualitative approach to classifying head and eye pose. In: IEEE Workshop on Applications of Computer Vision, p. 208. IEEE Computer Society Press, Los Alamitos (1998)
Bi, G., Poo, M.: Synaptic modification by correlated activity: Hebb’s postulate revisited. Annu. Revi. Neurosci. 24, 139–166 (2001)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Proc. of the European Conference on Computer Vision, vol. 2, pp. 484–498 (1998)
Daugman, J.: Complete discrete 2d gabor transforms by neural networks for image analysis and compression. Transactions on Acoustics, Speech, and Signal Processing 36(7), 1169–1179 (1988)
Delorme, A., Thorpe, S.: Face identification using one spike per neuron. Neural Networks 14, 795–803 (2001)
Guyonneau, R., van Rullen, R., Thorpe, S.: Neurons tuned to the earliest spikes through stdp. Neural Computation 17(4), 859–879 (2005)
Krüger, N., Pötzsch, M., von der Malsburg, C.: Determination of face position and pose with a learned representation based on labelled graphs. Image Vision Comput. 15(8), 665–673 (1997)
Masquelier, T., Thorpe, S.: Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Comput. Biol. 3(2), 31 (2007)
Meyers, E., Wolf, L.: Using biologically inspiered features for face processing. International Journal of Computer Vision 76, 93–104 (2008)
Nagai, Y.: The role of motion information in learning human-robot joint attention. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 2069–2074 (2005)
Pentland, A., Moghaddam, B., Starner, T.: View-based and modular eigenspaces for face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (1994)
Phillips, P., Moon, H., Rauss, P., Rizvi, S.: The feret evaluation methodology for face recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090–1104 (2000)
Poppe, R., Rienks, R., Heylen, D.: Accuracy of head orientation perception in triadic situations: Experiment in a virtual environment. Perception 36(7), 971–979 (2007)
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)
Sperling, G.: Model of visual adaptation and contrast detection. Perception and Psychophysic 8, 143–157 (1970)
Strauss, P.-M.: A slds for perception and interaction in multi-user environments. In: 2nd Int’l Conf. on Intelligent Environments, pp. 171–174 (2006)
Thorpe, S.: Parallel processing in neural systems and computers. In: Spike arrival times: A highly efficient coding scheme for neural networks, pp. 91–94. Elsevier, Amsterdam (1990)
Vatahska, T., Bennewitz, M., Behnke, S.: Feature based head pose estimation from images. In: IEEE Conf. on Humanoid Robots (2007)
Voit, M., Nickel, K., Stiefelhagen, R.: Neural Network-Based Head Pose Estimation and Multi-view Fusion. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 291–298. Springer, Heidelberg (2007)
Weidenbacher, U., Layher, G., Bayerl, P., Neumann, H.: Detection of Head Pose and Gaze Direction for Human-Computer Interaction. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Weber, M. (eds.) PIT 2006. LNCS (LNAI), vol. 4021, pp. 9–19. Springer, Heidelberg (2006)
Zhao, W., Chellappa, R., Rosenfeld, A., Phillips, P.: Face recognition: A literature survey. ACM Computing Surveys 35(4), 399–458 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weidenbacher, U., Neumann, H. (2008). Unsupervised Learning of Head Pose through Spike-Timing Dependent Plasticity. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-69369-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69368-0
Online ISBN: 978-3-540-69369-7
eBook Packages: Computer ScienceComputer Science (R0)