Image classification method rationally utilizing spatial information of the image


In order to improve the accuracy of image classification problem, this paper proposes a new classification method based on image feature extraction and neural network. The method consists of two stages: image feature extraction and neural network classification. At the stage of feature extraction, spatial pyramid matching (SPM) feature, local position feature and global contour feature are extracted. The utilization of spatial information in SPM feature is effectively improved by combining the above three features. At the stage of neural network classification, a multi-hidden layer feedforward Histogram Intersection Kernel Weighted Learning Network (HWLN) is proposed to take advantage of three features to improve the classification accuracy. In the structure of network, the hidden layer output features are used as the bias of the input features to weight the coefficients of the features. The input layer and the hidden layers are directly connected with the output layer to realize the combination of linear mapping and nonlinear mapping. And the Histogram Intersection Kernel is used instead of random initialization of the input weight matrix. Taking combined features as input information, HWLN can realize the mapping relationship between input information and target category, so as to complete the image classification task. Extensive experiments are performed on Caltech 101, Caltech 256 and MSRC databases respectively. The experimental results show that the proposed method further utilizes the spatial information, and thus improves the accuracy of image classification.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Ahmed KT, Irtaza A, Iqbal MA (2017) Fusion of local and global features for effective image extraction. Appl Intell 47(2):526–543

    Article  Google Scholar 

  2. 2.

    Anwar H, Zambanini S, Kampel M (2014) Encoding spatial arrangements of visual words for rotation-invariant image classification. In: German Conference on Pattern Recognition, 443–452

  3. 3.

    Avila S, Thome N, Cord M et al (2013) Pooling in image representation:the visual Codeword point of view. Comput Vis Image Underst 117(5):453–465.

    Article  Google Scholar 

  4. 4.

    Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition. CVPR. 1–8

  5. 5.

    Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns. In: Computer Vision, ICCV, pp 1–8

  6. 6.

    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):389–396.

    Article  Google Scholar 

  7. 7.

    Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV, 1–22

  8. 8.

    Cunha ALD, Zhou JP, Do MN (2006) The nonsubsampled contourlet transform: theory, design, and applications. IEEE Trans Image Process 15(10):3089–3101.

    Article  Google Scholar 

  9. 9.

    Deng WY, Ong YS, Zheng QH (2016) A fast reduced kernel extreme learning machine. Neural Netw 76:29–38

    Article  MATH  Google Scholar 

  10. 10.

    Frome A, Singer Y, Malik J (2007) Image retrieval and classification using local distance functions. In: Advances in neural information processing systems, 417–424

  11. 11.

    Frome A, Singer Y, Sha F, Malik J (2007) Learning globally-consistent local distance functions for shape-based image retrieval and classification. IEEE International Conference on Computer Vision

  12. 12.

    Goh H, Thome N, Cord M, Lim JH (2014) Learning deep hierarchical visual feature coding. IEEE Trans Neural Netw Learn Syst 25(12):2212–2225

    Article  Google Scholar 

  13. 13.

    Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features. International Conference on Computer Vision. 1458–1465

  14. 14.

    Gui J, Liu T, Tao D, Tan T (2016) Representative vector machines: a unified framework for classical classifiers. IEEE Trans Cybernet 46(8):1877–1888

    Article  Google Scholar 

  15. 15.

    Hu J, Shen L, Sun G (2017) Squeeze-and-Excitation Networks. arXiv preprint arXiv:1709.01507

  16. 16.

    Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, 2004. IEEE, 985–990

  17. 17.

    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Article  Google Scholar 

  18. 18.

    Huang FJ, Boureau YL, LeCun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Computer Vision and Pattern Recognition, CVPR, 1–8

  19. 19.

    Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529

    Article  Google Scholar 

  20. 20.

    Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In Computer Vision and Pattern Recognition. CVPR. 3304–3311

  21. 21.

    Juneja M, Vedaldi A, Jawahar CV, Zisserman A (2013) Blocks that shout: Distinctive parts for scene classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 923–930

  22. 22.

    Khan R, Barat C, Muselet D, Ducottet C (2015) Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model. Comput Vision Image Understand 132:102–112.

    Article  Google Scholar 

  23. 23.

    Koniusz P, Yan F, Gosselin P, Mikolajczyk K (2017) Higher-order occurrence pooling for bags-of-words: visual concept detection. IEEE Trans Pattern Anal Mach Intell 39(2):313–326.

    Article  Google Scholar 

  24. 24.

    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conference on Computer Vision and Pattern Recognition New York, 2169–2178

  25. 25.

    Li G, Niu P, Duan X, Zhang X (2014) Fast learning network: a novel artificial neural network with a fast learning speed. Neural Comput & Applic 24(7–8):1683–1695

    Article  Google Scholar 

  26. 26.

    Li WS, Dong P, Xiao B, Zhou L (2016) Object recognition based on the region of interest and optimal bag of words model. Neurocomputing 172(8):271–280.

    Article  Google Scholar 

  27. 27.

    Li YQ, Wu C, Li HB (2018) Image classification method combining local position feature with global contour feature[J]. Acta Electron Sin 46(7):1726–1731.

    Google Scholar 

  28. 28.

    Li Q, Peng Q, Chen J, Yan C (2018) Improving image classification accuracy with ELM and CSIFT. Comput Sci Eng 99:1–1

    Google Scholar 

  29. 29.

    Liu LQ, Wang L, Liu XW (2011) In defense of soft-assignment coding. Proceedings of the International Conference on Computer Vision. 2486–2493.

  30. 30.

    Mansourian L, Abdullah MT, Abdullah LN, Azman A, Mustaffa MR, Applications (2018) An effective fusion model for image retrieval. Multimed Tools Appl, 77 (13):16131–16154

  31. 31.

    Microsoft Research Cambridge Object Recognition Image Database,

  32. 32.

    Nilsback ME, Zisserman A (2008) Automated flower classification over a large number of classes. In: Computer Vision, Graphics & Image Processing, 722–729

  33. 33.

    Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition, 1–8

  34. 34.

    Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision, 143–156

  35. 35.

    Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    MathSciNet  Article  MATH  Google Scholar 

  36. 36.

    Van Gemert JC, Veenman CJ, Smeulders AW, Geusebroek JM (2009) Visual word ambiguity. IEEE Trans Patt Anal Mach Intell 32(7):1271–1283

    Article  Google Scholar 

  37. 37.

    Wang JY, Yang JC, Yu K, et al (2010) Locality-constrained linear coding for image classification. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3360–3367

  38. 38.

    Wang S, Lu J, Gu X, Yang J (2016) Semi-supervised linear discriminant analysis for dimension reduction and classification. Pattern Recogn 57:179–189

    Article  MATH  Google Scholar 

  39. 39.

    Xiong W, Zhang L, Du B, Tao D (2017) Combining local and global: rich and robust feature pooling for visual recognition. Pattern Recogn 62:225–235

    Article  Google Scholar 

  40. 40.

    Zafar B, Ashraf R, Ali N, Ahmed M, Jabbar S, Chatzichristofis SA (2018) Image classification by addition of spatial information based on histograms of orthogonal vectors. PLoS One 13(6):e0198175

    Article  Google Scholar 

  41. 41.

    Zhu QH, Wang ZZ, Mao XJ, Yang YB (2017) Spatial locality-preserving feature coding for image classification. Appl Intell 47(1):148–157

    Article  Google Scholar 

  42. 42.

    Zou J, Li W, Chen C, Du Q (2016) Scene classification using local and global features with collaborative representation fusion. Inf Sci 348:209–226

    MathSciNet  Article  Google Scholar 

Download references


This work is supported by National Natural Science Foundation of China (No. 51641609), Natural Science Foundation of Hebei Province of China (No. F2015203212).

Author information



Corresponding author

Correspondence to Yaqian Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wu, C., Li, Y., Zhao, Z. et al. Image classification method rationally utilizing spatial information of the image. Multimed Tools Appl 78, 19181–19199 (2019).

Download citation


  • Spatial pyramid matching
  • Local position feature
  • Global contour feature
  • Neural network
  • Histogram intersection kernel weighted learning network