Abstract
In this chapter,\(^\dagger\) we present a deep model-based and data-driven hybrid architecture (DMD) for feature extraction. First, we construct a deep learning pipeline for progressively learning image features from simple to complex. We mix this deep model-based pipeline with a data-driven pipeline, which extracts features from a large collection of unlabeled images. Sparse regularization is then performed on features extracted from both pipelines in an unsupervised way to obtain representative patches. Upon obtaining these patches, a supervised learning algorithm is employed to conduct object prediction. We present how DMD works and explain why it works more effectively than traditional models from both aspects of neuroscience and computational learning theory.
†© ACM, 2010. This chapter is a minor revision of the author’s work with Zhiyu Wang and Dingyin Xia [1] published in VLS-MCMR’10. Permission to publish this chapter is granted under copyright license #2587600190581.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Disclaimer: We do not claim these heuristic-based features to be novel. Other heuristic-based features [27] may also be useful. What we consider to be important is that these features can augment model-based features to improve diversity before a principled theory can be formulated by neuroscientists to model cortex feedback/feedforward recursive signals.
References
Z. Wang, D. Xia, E.Y. Chang, A deep model-based and data-driven hybrid architecture for image annotation, in Proceedings of ACM International Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval, pp. 13–18, 2010
D.H. Hubel, T.N. Wiesel, Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195(1), 215–243 (1968)
E. Miller, The prefrontal cortex and cognitive control. Nat. Rev. Neurosci. 1(1), 59–66 (2000)
G. Potamianos, C. Neti, J. Luettin, I. Matthews, Audio–visual automatic speech recognition: An overview. in Issues in Visual and Audio–Visual Speech Processing (MIT Press, Cambridge, 2004)
H. Lee, R. Grosse, R. Ranganath, A. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of International Con- ference on Machine Learning (ICML), 2009
T. Serre, Learning a dictionary of shape-components in visual cortex: comparison with neu- rons, humans and machines. Ph.D. Thesis, Massachusetts Institute of Technology, 2006
M. Riesenhuber, T. Poggio, Are cortical models really bound by the binding problem. Neuron 24(1), 87–93 (1999)
T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, T. Poggio, Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007)
Y. Bengio, Learning Deep Architectures for AI (Now Publishers, 2009)
G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Y. Bengio, Y. LeCun, Scaling learning algorithms towards AI, in Large-Scale Kernel Machines (MIT Press, Cambridge, 2007), pp. 321–360
G. Loosli, S. Canu, L. Bottou, Training invariant support vector machines using selective sampling, in Large Scale Kernel Machines (MIT Press, Cambridge, 2007) pp. 301–320
M. Yasuda, T. Banno, H. Komatsu, Color selectivity of neurons in the posterior inferior temporal cortex of the macaque monkey. Cereb. Cortex 20(7), 1630–1646 (2009)
K. Tsunoda, Y. Yamane, M. Nishizaki, M. Tanifuji, Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nat. Neurosci. 4, 832–838 (2001)
I. Lampl, D. Ferster, T. Poggio, M. Riesenhuber, Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. J. Neurophysiol. 92(5), 2704 (2004)
T. Gawne, J. Martin, Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. J. Neurophysiol. 88(3), 1128 (2002)
D. Hubel, T. Wiesel, Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106 (1962)
M. Ranzato, F. Huang, Y. Boureau, Y. LeCun, Unsupervised learning of invariant feature hierarchies with applications to object recognition, in Proceedings of IEEE CVPR, 2007
J. Jones, L. Palmer, An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 58(6), 1233 (1987)
T. Serre, M. Riesenhuber, Realistic modeling of simple and complex cell tuning in the hmax model, and implications for invariant object recognition in cortex. MIT technical report, 2004
C. Ekanadham, S. Reader, H. Lee, Sparse deep belief net models for visual area V2, in Proceedings of NIPS, 2008
B. Olshausen, C. Anderson, D. Van Essen, A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700 (1993)
D. Walther, T. Serre, T. Poggio, C. Koch, Modeling feature sharing between object detection and top-down attention. J. Vis. 5(8), 1041 (2005)
S. Chikkerur, T. Serre, T. Poggio, A Bayesian inference theory of attention: neuroscience and algorithms. MIT technical report MIT-CSAIL-TR-2009-047, 2009
E. Chang, B. Li, C. Li, Toward perception-based image retrieval.IEEE Content-Based Access of Image and Video Libraries, pp. 101–105, 2000
S. Tong, E.Y. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia (ACM, New York, 2001) pp. 107–118
R. Datta, D. Joshi, J. Li, J. Wang, Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. (CSUR) 40(2), 1–60 (2008)
S. Coren, L.M. Ward, J.T. Enns, Sensation and Perception, 6th edn. (Wiley, New York, 2003)
J. Leu, Computing a shape’s moments from its boundary. Pattern Recognit. 24(10), 949–957 (1991)
H. Tamura, S. Mori, T. Yamawaki, Textural features corresponding to visual perception. IEEE Trans. Syst. Man Cybern. 8(6), 460–473 (1978)
J. Smith, S. Chang, Automated image retrieval using color and texture. IEEE Trans. Pattern Anal. Mach. Intell. 1996
P. Wu, B. Manjunath, S. Newsam, H. Shin, A texture descriptor for browsing and similarity retrieval. Sig. Process. Image Commun. 16(1-2), 33–43 (2000)
W. Ma, H. Zhang, Benchmarking of image features for content-based retrieval, in Proceedings of Asilomar Conference on Signals Systems and Computers, pp. 253–260, 1998
J. Deng, W. Dong, R. Socher, L. Li, K. Li, F.F. Li, Imagenet: a large-scale hierarchical image database, in Proceedings of IEEE CVPR, pp 156–161, 2009
A. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-based image retrieval at the end of the early years. IEEE Trans. Pattern. Anal. Mach. Intell. 22(12), 1349–1380 (2000)
Y. Ke, R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, in Proceedings of IEEE CVPR, pp. 506–513, 2004
O. Chapelle, P. Haffner, V. Vapnik, SVMs for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055 (1999)
D. Blei, A. Ng, M. Jordan, Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
T.L.W. Serre, T. Poggio, Object recognition with features inspired by visual cortex, in Proceedings of IEEE CVPR, 2005
C. Gross, Visual functions of inferotemporal cortex. Handbook of Sensory Physiology, vol 7(3), 1973
C. Gross, C. Rocha-Miranda, D. Bender, Visual properties of neurons in inferotemporal cortex of the macaque. J. Neurophysiol. 35(1), 96–111 (1972)
R. Salakhutdinov, A. Mnih, G. Hinton, Restricted Boltzmann machines for collaborative filtering, in Proceedings of International Conference on Machine Learning (ICML), pp. 791–798, 2007
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Pres
About this chapter
Cite this chapter
Chang, E.Y. (2011). Perceptual Feature Extraction. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-20429-6_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)