Hierarchical Image Representation Using Deep Network

  • Emrah Ergul
  • Sarp Erturk
  • Nafiz AricaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9280)


In this paper, we propose a new method for features learning from unlabeled data. Basically, we simulate k-means algorithm in deep network architecture to achieve hierarchical Bag-of-Words (BoW) representations. We first learn visual words in each layer which are used to produce BoW feature vectors in the current input space. We transform the raw input data into new feature spaces in a convolutional manner such that more abstract visual words are extracted at each layer by implementing Expectation-Maximization (EM) algorithm. The network parameters are optimized as we keep the visual words fixed in the Expectation step while the visual words are updated with the current parameters of the network in the Maximization step. Besides, we embed spatial information into BoW representation by learning different networks and visual words for each quadrant regions. We compare the proposed algorithm with the similar approaches in the literature using a challenging 10-class-dataset, CIFAR-10.


Deep network architectures Image classification Unsupervised feature extraction Bag-of-words representation 


  1. 1.
    Alpaydın, E.: Introduction to Machine Learning. The MIT Press, London (2004)Google Scholar
  2. 2.
    Bengio, Y., Courville, A., Vincent, P.: Representation Learning: A Review and New Perspectives. PAMI 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  3. 3.
    Lowe, D.: Distinctive Image Features From Scale Invariant Keypoints. Int’l J. Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  4. 4.
    Bay, H., Ess, A., Tuytelaars, T., Gool, L.C.: SURF: Speeded Up Robust Features. Computer Vision and Image Understanding (CVIU) 110(3), 346–359 (2008)CrossRefGoogle Scholar
  5. 5.
    Bosch, A., Zisserman A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: ACM International Conference on Image and Video Retrieval (2007)Google Scholar
  6. 6.
    Oliva, A., Torralba, A.: Modeling the Shape of the Scene: a Holistic Representation of the Spatial Envelope. Int’l J. Computer Vision 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  7. 7.
    Krizhevsky, A., Hinton, G.E.: Using very deep auto-encoders for content-based image retrieval. In: ESANN (2011)Google Scholar
  8. 8.
    Coates, A., Lee, H., Andrew, Y.N.: An analysis of single-layer networks in unsupervised feature learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2011)Google Scholar
  9. 9.
    Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18(7), 1527–1554 (2006)CrossRefMathSciNetzbMATHGoogle Scholar
  11. 11.
    Ergul, E., Arica, N.: Scene classification using spatial pyramid of latent topics. In: ICPR, pp. 3603–3606 (2010)Google Scholar
  12. 12.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. IEEE CVPR, vol. 2, pp. 2169–2178 (2006)Google Scholar
  13. 13.
    Arel I., Rose D.C., Karnowski T.P.: Deep Machine Learning: A New Frontier in Artificial Intelligence Research. IEEE Computational Intelligence Magazine 5 (2010)Google Scholar
  14. 14.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Geedy Layer-wise Training of Deep Networks. NIPS (2007)Google Scholar
  15. 15.
    Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  16. 16.
    Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machine. University of Toronto (2010)Google Scholar
  17. 17.
    Quoc, L., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., Andrew, N.: Building high-level features using large scale unsupervised learning. In: International Conference in Machine Learning (2012)Google Scholar
  18. 18.
    Yang, Y., Shah, M.: Complex events detection using data-driven concepts. In: ECCV, pp. 722–735 (2012)Google Scholar
  19. 19.
    Srivastava, N.: Improving Neural Networks with Dropout. Master of Science Thesis, University of Toronto (2013)Google Scholar
  20. 20.
    Raina, R., Battle, A., Honglak, L., Packer, B., Andrew Y.N.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th Int’l Conf. on Machine Learning (ICML) (2007)Google Scholar
  21. 21.
    Krizhevsky, A.: Convolutional Deep Belief Networks on CIFAR-10. Technical Report (2010)Google Scholar
  22. 22.
  23. 23.
    Ranzato, M., Krizhevsky, A., Hinton, G.E.: Factored 3-way restricted boltzmann machines for modeling natural images. In: ASTATS 13 (2010)Google Scholar
  24. 24.
    Ranzato, M., Hinton, G.E.: Modeling pixel means and covariances using factorized third-order boltzmann machines. In: CVPR (2010)Google Scholar
  25. 25.
    Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 392–407. Springer, Heidelberg (2014)Google Scholar
  26. 26.
    Bergamo, A., Sinha, S.N., Torresani, L.: Leveraging structure from motion to learn discriminative codebooks for scalable landmark classication. In: CVPR (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Electronics & Communication Engineering Department of Kocaeli UniversityKocaeliTurkey
  2. 2.Software Engineering Department of Bahcesehir UniversityIstanbulTurkey

Personalised recommendations