Skip to main content

Abstract

In this chapter,\(^\dagger\) we present a deep model-based and data-driven hybrid architecture (DMD) for feature extraction. First, we construct a deep learning pipeline for progressively learning image features from simple to complex. We mix this deep model-based pipeline with a data-driven pipeline, which extracts features from a large collection of unlabeled images. Sparse regularization is then performed on features extracted from both pipelines in an unsupervised way to obtain representative patches. Upon obtaining these patches, a supervised learning algorithm is employed to conduct object prediction. We present how DMD works and explain why it works more effectively than traditional models from both aspects of neuroscience and computational learning theory.

†© ACM, 2010. This chapter is a minor revision of the author’s work with Zhiyu Wang and Dingyin Xia [1] published in VLS-MCMR’10. Permission to publish this chapter is granted under copyright license #2587600190581.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Disclaimer: We do not claim these heuristic-based features to be novel. Other heuristic-based features [27] may also be useful. What we consider to be important is that these features can augment model-based features to improve diversity before a principled theory can be formulated by neuroscientists to model cortex feedback/feedforward recursive signals.

References

  1. Z. Wang, D. Xia, E.Y. Chang, A deep model-based and data-driven hybrid architecture for image annotation, in Proceedings of ACM International Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval, pp. 13–18, 2010

    Google Scholar 

  2. D.H. Hubel, T.N. Wiesel, Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195(1), 215–243 (1968)

    Google Scholar 

  3. E. Miller, The prefrontal cortex and cognitive control. Nat. Rev. Neurosci. 1(1), 59–66 (2000)

    Article  Google Scholar 

  4. G. Potamianos, C. Neti, J. Luettin, I. Matthews, Audio–visual automatic speech recognition: An overview. in Issues in Visual and Audio–Visual Speech Processing (MIT Press, Cambridge, 2004)

    Google Scholar 

  5. H. Lee, R. Grosse, R. Ranganath, A. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of International Con- ference on Machine Learning (ICML), 2009

    Google Scholar 

  6. T. Serre, Learning a dictionary of shape-components in visual cortex: comparison with neu- rons, humans and machines. Ph.D. Thesis, Massachusetts Institute of Technology, 2006

    Google Scholar 

  7. M. Riesenhuber, T. Poggio, Are cortical models really bound by the binding problem. Neuron 24(1), 87–93 (1999)

    Article  Google Scholar 

  8. T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, T. Poggio, Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007)

    Article  Google Scholar 

  9. Y. Bengio, Learning Deep Architectures for AI (Now Publishers, 2009)

    Google Scholar 

  10. G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Y. Bengio, Y. LeCun, Scaling learning algorithms towards AI, in Large-Scale Kernel Machines (MIT Press, Cambridge, 2007), pp. 321–360

    Google Scholar 

  12. G. Loosli, S. Canu, L. Bottou, Training invariant support vector machines using selective sampling, in Large Scale Kernel Machines (MIT Press, Cambridge, 2007) pp. 301–320

    Google Scholar 

  13. M. Yasuda, T. Banno, H. Komatsu, Color selectivity of neurons in the posterior inferior temporal cortex of the macaque monkey. Cereb. Cortex 20(7), 1630–1646 (2009)

    Article  Google Scholar 

  14. K. Tsunoda, Y. Yamane, M. Nishizaki, M. Tanifuji, Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nat. Neurosci. 4, 832–838 (2001)

    Article  Google Scholar 

  15. I. Lampl, D. Ferster, T. Poggio, M. Riesenhuber, Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. J. Neurophysiol. 92(5), 2704 (2004)

    Article  Google Scholar 

  16. T. Gawne, J. Martin, Responses of primate visual cortical V4 neurons to simultaneously presented stimuli. J. Neurophysiol. 88(3), 1128 (2002)

    Google Scholar 

  17. D. Hubel, T. Wiesel, Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106 (1962)

    Google Scholar 

  18. M. Ranzato, F. Huang, Y. Boureau, Y. LeCun, Unsupervised learning of invariant feature hierarchies with applications to object recognition, in Proceedings of IEEE CVPR, 2007

    Google Scholar 

  19. J. Jones, L. Palmer, An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J. Neurophysiol. 58(6), 1233 (1987)

    Google Scholar 

  20. T. Serre, M. Riesenhuber, Realistic modeling of simple and complex cell tuning in the hmax model, and implications for invariant object recognition in cortex. MIT technical report, 2004

    Google Scholar 

  21. C. Ekanadham, S. Reader, H. Lee, Sparse deep belief net models for visual area V2, in Proceedings of NIPS, 2008

    Google Scholar 

  22. B. Olshausen, C. Anderson, D. Van Essen, A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700 (1993)

    Google Scholar 

  23. D. Walther, T. Serre, T. Poggio, C. Koch, Modeling feature sharing between object detection and top-down attention. J. Vis. 5(8), 1041 (2005)

    Article  Google Scholar 

  24. S. Chikkerur, T. Serre, T. Poggio, A Bayesian inference theory of attention: neuroscience and algorithms. MIT technical report MIT-CSAIL-TR-2009-047, 2009

    Google Scholar 

  25. E. Chang, B. Li, C. Li, Toward perception-based image retrieval.IEEE Content-Based Access of Image and Video Libraries, pp. 101–105, 2000

    Google Scholar 

  26. S. Tong, E.Y. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia (ACM, New York, 2001) pp. 107–118

    Google Scholar 

  27. R. Datta, D. Joshi, J. Li, J. Wang, Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. (CSUR) 40(2), 1–60 (2008)

    Article  Google Scholar 

  28. S. Coren, L.M. Ward, J.T. Enns, Sensation and Perception, 6th edn. (Wiley, New York, 2003)

    Google Scholar 

  29. J. Leu, Computing a shape’s moments from its boundary. Pattern Recognit. 24(10), 949–957 (1991)

    Article  MathSciNet  Google Scholar 

  30. H. Tamura, S. Mori, T. Yamawaki, Textural features corresponding to visual perception. IEEE Trans. Syst. Man Cybern. 8(6), 460–473 (1978)

    Article  Google Scholar 

  31. J. Smith, S. Chang, Automated image retrieval using color and texture. IEEE Trans. Pattern Anal. Mach. Intell. 1996

    Google Scholar 

  32. P. Wu, B. Manjunath, S. Newsam, H. Shin, A texture descriptor for browsing and similarity retrieval. Sig. Process. Image Commun. 16(1-2), 33–43 (2000)

    Article  Google Scholar 

  33. W. Ma, H. Zhang, Benchmarking of image features for content-based retrieval, in Proceedings of Asilomar Conference on Signals Systems and Computers, pp. 253–260, 1998

    Google Scholar 

  34. J. Deng, W. Dong, R. Socher, L. Li, K. Li, F.F. Li, Imagenet: a large-scale hierarchical image database, in Proceedings of IEEE CVPR, pp 156–161, 2009

    Google Scholar 

  35. A. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-based image retrieval at the end of the early years. IEEE Trans. Pattern. Anal. Mach. Intell. 22(12), 1349–1380 (2000)

    Article  Google Scholar 

  36. Y. Ke, R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, in Proceedings of IEEE CVPR, pp. 506–513, 2004

    Google Scholar 

  37. O. Chapelle, P. Haffner, V. Vapnik, SVMs for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055 (1999)

    Article  Google Scholar 

  38. D. Blei, A. Ng, M. Jordan, Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  39. T.L.W. Serre, T. Poggio, Object recognition with features inspired by visual cortex, in Proceedings of IEEE CVPR, 2005

    Google Scholar 

  40. C. Gross, Visual functions of inferotemporal cortex. Handbook of Sensory Physiology, vol 7(3), 1973

    Google Scholar 

  41. C. Gross, C. Rocha-Miranda, D. Bender, Visual properties of neurons in inferotemporal cortex of the macaque. J. Neurophysiol. 35(1), 96–111 (1972)

    Google Scholar 

  42. R. Salakhutdinov, A. Mnih, G. Hinton, Restricted Boltzmann machines for collaborative filtering, in Proceedings of International Conference on Machine Learning (ICML), pp. 791–798, 2007

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Pres

About this chapter

Cite this chapter

Chang, E.Y. (2011). Perceptual Feature Extraction. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20429-6_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20428-9

  • Online ISBN: 978-3-642-20429-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics