Algorithmic Optimizations in the HMAX Model Targeted for Efficient Object Recognition

Bitar, Ahmad W.; Mansour, Mohamad M.; Chehab, Ali

doi:10.1007/978-3-319-29971-6_20

Ahmad W. Bitar¹⁷,
Mohamad M. Mansour¹⁷ &
Ali Chehab¹⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 598))

Included in the following conference series:

International Joint Conference on Computer Vision, Imaging and Computer Graphics

1118 Accesses
3 Citations

Abstract

In this paper, we propose various approximations aimed at increasing the accuracy of the S1, C1 and S2 layers of the original Gray HMAX model of the visual cortex. At layer S1, an image is convolved with 64 separable gabor filters in the spatial domain after removing some irrelevant information such as illumination and expression variations. At layer C1, some of the minimum scales values are exploited in addition to the maximum ones in order to increase the model’s accuracy. By applying the embedding space in the additive domain, the advantage of some of the minimum scales values is taken by embedding them into their corresponding maximum ones based on a weight value between 0 and 1. At layer S2, we apply clustering, which is considered one the most interesting research areas in the field of data mining, in order to enhance the manner by which all the prototypes are selected during the feature learning stage. This is achieved by using the Partitioning Around Medoid (PAM) clustering algorithm. The impact of these approximations in terms of accuracy and computational complexity was evaluated on the Caltech101 dataset containing a total of 9,145 images split between 101 distinct object categories in addition to a background category, and compared with the baseline performance using support vector machine (SVM) and nearest neighbor (NN) classifiers. The results show that our model provides significant improvement in accuracy at the S1 layer by more than 10 % where the computational complexity is also reduced. The accuracy is slightly increased for both approximations at the C1 and S2 layers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), pp. 994–1000 (2005b)
Google Scholar
Serre, T., Kouh, M., Cadieu, C., Knoblich, U., Kreiman, G., Poggio, T.: A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex. CBCL Paper #259/AI Memo #2005-036, Massachusetts Institute of Technology, Cambridge, MA (2005a)
Google Scholar
Amayeh, G., Tavakkoli, A., Bebis, G.: Accurate and efficient computation of gabor features in real-time applications. In: Bebis, G., et al. (eds.) ISVC 2009, Part I. LNCS, vol. 5875, pp. 243–252. Springer, Heidelberg (2009)
Chapter Google Scholar
Cadieu, C., Kouh, M., Riesenhuber, M., Poggio, T.: Shape representation in v4: Investigating position-specific tuning for boundary conformation with the standard model of object recognition. J. Vis. 5(8), 671 (2005)
Article Google Scholar
Bermudez-Contreras, E., Buxton, H., Spier, E.: Attention can improve a simple model for object recognition. Image Vis. Comput. 26, 776–787 (2008)
Article Google Scholar
Serre, T., Riesenhuber, M.: Realistic modeling of simple and complex cell tuning in the hmax model, and implications for invariant object recognition in cortex. Massachusetts Institute of Technology, Cambridge, MA. CBCL, Paper 239/Al Memo 2004–017 (2004)
Google Scholar
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortexlike mechanisms. In: IEEE Conference on Pattern Analysis and Machine Intelligence, vol. 29, pp. 411–426 (2007b)
Google Scholar
Mutch, J., Lowe, D.G.: Multiclass object recognition with sparse, localized features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 11–18 (2006)
Google Scholar
Chikkerur, S., Poggio, T.: Approximations in the hmax model. MIT-CSAIL-TR-2011-021, CBCL-298, p. 12 (2011)
Google Scholar
Holub, A., Welling, M.: Exploiting unlabelled data for hybrid object classification. In: Advances in Neural Information Processing Systems (NIPS 2005) Workshop in Inter-Class Transfer (2005)
Google Scholar
Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1458–1465 (2005)
Google Scholar
Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., Poggio, T.: A quantitative theory of immediate visual recognition. Prog. Brain Res. Comput. Neurosci. Theor. Insights Brain Funct. 165, 33–56 (2007a)
Article Google Scholar
Sharif, M., Anis, S., Raza, M., Mohsin, S.: Enhanced SVD based face recognition. J. Appl. Comput. Sci. Math. 12, 49 (2012)
Google Scholar
Kumar, P., Wasan, S.K.: Comparative study of k-means, pam and rough k-means algorithms using cancer datasets. In: Proceedings of CSIT: 2009 International Symposium on Computing, Communication, and Control (ISCCC) Singapore, 2011, pp. 136–140 (2011)
Google Scholar
Crochiere, R., Webber, S., Flanagan, J.: Digital coding of speech in sub-bands. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 233–236 (1976)
Google Scholar
Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)
Article Google Scholar
Vetterli, M., Le Gall, D.: Perfect reconstruction FIR filter banks: Some properties and factorizations. IEEE Trans. Acoust. Speech Sig. Process. 37(7), 1057–1071 (1989)
Article Google Scholar
Hubel, D.H., Freeman, W.H.: The Human Eye: Structure and Function. Sinauer Associates, Sunderland (1999)
Google Scholar
Oyster, C.W.: Eye, Brain and Vision. vol. 12(1), pp. 40–41 (1989)
Google Scholar
Purves, D.: Brains: How They Seem To Work. FT Press, Upper Saddle River (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, American University of Beirut, Beirut, 1107 2020, Lebanon
Ahmad W. Bitar, Mohamad M. Mansour & Ali Chehab

Authors

Ahmad W. Bitar
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad M. Mansour
View author publications
You can also search for this author in PubMed Google Scholar
Ali Chehab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Chehab .

Editor information

Editors and Affiliations

Escola Superior de Tecnologia do IPS, Setúbal, Portugal
José Braz
Inria-Rennes/MimeTIC Team, Rennes cedex, France
Julien Pettré
LISA - ISTIA, University of Angers, Angers, France
Paul Richard
Linnaeus University, Växjö, Sweden
Andreas Kerren
Jacobs University, Bremen, Germany
Lars Linsen
Università di Catania, Catania, Catania, Italy
Sebastiano Battiato
Research Innovation Center, Canon U.S.A. Inc, San Jose, CA, USA
Francisco Imai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bitar, A.W., Mansour, M.M., Chehab, A. (2016). Algorithmic Optimizations in the HMAX Model Targeted for Efficient Object Recognition. In: Braz, J., et al. Computer Vision, Imaging and Computer Graphics Theory and Applications. VISIGRAPP 2015. Communications in Computer and Information Science, vol 598. Springer, Cham. https://doi.org/10.1007/978-3-319-29971-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-29971-6_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29970-9
Online ISBN: 978-3-319-29971-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics