Advertisement

Knowledge and Information Systems

, Volume 52, Issue 2, pp 509–530 | Cite as

Learning extremely shared middle-level image representation for scene classification

  • Peng Tang
  • Jin Zhang
  • Xinggang Wang
  • Bin Feng
  • Fabio Roli
  • Wenyu Liu
Regular Paper
  • 492 Downloads

Abstract

Learning middle-level image representations is very important for the computer vision community, especially for scene classification tasks. Middle-level image representations currently available are not sparse enough to make training and testing times compatible with the increasing number of classes that users want to recognize. In this work, we propose a middle-level image representation based on the pattern that extremely shared among different classes to reduce both training and test time. The proposed learning algorithm first finds some class-specified patterns and then utilizes the lasso regularization to select the most discriminative patterns shared among different classes. The experimental results on some widely used scene classification benchmarks (15 Scenes, MIT-indoor 67, SUN 397) show that the fewest patterns are necessary to achieve very remarkable performance with reduced computation time.

Keywords

Scene classification Middle-level image representation Extremely shared patterns 

Notes

Acknowledgements

We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Natural Science Foundation of China under Grant 61572207 and Grant 61503145, and the CAST Young Talent Supporting Program.

References

  1. 1.
    Argyriou A, Evgeniou T, Pontil M (2006) Multi-task feature learning. In: Proceedings of neural information processing systems, pp 41–48Google Scholar
  2. 2.
    Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3d human pose annotations. In: Proceedings of international conference on computer vision, pp 1365–1372Google Scholar
  3. 3.
    Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the British machine vision conferenceGoogle Scholar
  4. 4.
    Cimpoi M, Maji S, Vedaldi A (2015) Deep filter banks for texture recognition and segmentation. In: Proceedings of computer vision and pattern recognition, pp 3828–3836Google Scholar
  5. 5.
    Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292zbMATHGoogle Scholar
  6. 6.
    Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Proceedings of workshop on statistical learning in computer vision, European conference on computer vision, pp 1–22Google Scholar
  7. 7.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of computer vision and pattern recognition, pp 886–893Google Scholar
  8. 8.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of computer vision and pattern recognition, pp 248–255Google Scholar
  9. 9.
    Dixit M, Chen S, Gao D, Rasiwasia N, Vasconcelos N (2015) Scene classification with semantic fisher vectors. In: Proceedings of computer vision and pattern recognition, pp 2974–2983Google Scholar
  10. 10.
    Doersch C, Gupta A, Efros AA (2013) Mid-level visual element discovery as discriminative mode seeking. In: Proceedings of neural information processing systems, pp 494–502Google Scholar
  11. 11.
    Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New YorkzbMATHGoogle Scholar
  12. 12.
    Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874zbMATHGoogle Scholar
  13. 13.
    Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: Proceedings of computer vision and pattern recognition, pp 1778–1785Google Scholar
  14. 14.
    Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Proceedings of European conference on computer vision, pp 392–407Google Scholar
  15. 15.
    Hwang SJ, Sha F, Grauman K (2011) Sharing features between objects and their attributes. In: Proceedings of computer vision and pattern recognition, pp 1761–1768Google Scholar
  16. 16.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM international conference on multimedia, pp 675–678Google Scholar
  17. 17.
    Juneja M, Vedaldi A, Jawahar CV, Zisserman A (2013) Blocks that shout: Distinctive parts for scene classification. In: Proceedings of computer vision and pattern recognition, pp 923–930Google Scholar
  18. 18.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of neural information processing systems, pp 1097–1105Google Scholar
  19. 19.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of computer vision and pattern recognition, pp 2169–2178Google Scholar
  20. 20.
    Li LJ, Su H, Fei-Fei L, Xing EP (2010) Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Proceedings of neural information processing systems, pp 1378–1386Google Scholar
  21. 21.
    Li Q, Wu J, Tu Z (2013) Harvesting mid-level visual concepts from large-scale internet images. In: Proceedings of computer vision and pattern recognition, pp 851–858Google Scholar
  22. 22.
    Li P, Lu X, Wang Q (2015a) From dictionary of visual words to subspaces: locality-constrained affine subspace coding. In: Proceedings of computer vision and pattern recognition, pp 2348–2357Google Scholar
  23. 23.
    Li Y, Liu L, Shen C, van den Hengel A (2015b) Mid-level deep pattern mining. In: Proceedings of computer vision and pattern recognition, pp 971–980Google Scholar
  24. 24.
    Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: Proceedings of international conference on computer vision, pp 2486–2493Google Scholar
  25. 25.
    Liu L, Shen C, Wang L, van den Hengel A, Wang C (2014) Encoding high dimensional local features by sparse coding based fisher vectors. In: Proceedings of neural information processing systems, pp 1143–1151Google Scholar
  26. 26.
    Liu L, Shen C, van den Hengel A (2015) The treasure beneath convolutional layers: cross-convolutional-layer pooling for image classification. In: Proceedings of computer vision and pattern recognition, pp 4749–4757Google Scholar
  27. 27.
    Lobel H, Vidal R, Soto A (2013) Hierarchical joint max-margin learning of mid and top level representations for visual recognition. In: Proceedings of international conference on computer vision, pp 1697–1704Google Scholar
  28. 28.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  29. 29.
    Neumann B, Möller R (2008) On scene interpretation with description logics. Image Vis Comput 26(1):82–101CrossRefGoogle Scholar
  30. 30.
    NG AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of international conference on machine learningGoogle Scholar
  31. 31.
    Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of computer vision and pattern recognition, pp 1717–1724Google Scholar
  32. 32.
    Ortega JM, Rheinboldt WC (1970) Iterative solution of nonlinear equations in several variables. Academic Press, New YorkzbMATHGoogle Scholar
  33. 33.
    Ott P, Everingham M (2011) Shared parts for deformable part-based models. In: Proceedings of computer vision and pattern recognition, pp 1513–1520Google Scholar
  34. 34.
    Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. In: Proceedings of international conference on computer vision, pp 1307–1314Google Scholar
  35. 35.
    Parameswaran S, Weinberger KQ (2010) Large margin multi-task metric learning. In: Proceedings of neural information processing systems, pp. 1867–1875Google Scholar
  36. 36.
    Parikh D, Grauman K (2011) Relative attributes. In: Proceedings of international conference on computer vision, pp 503–510Google Scholar
  37. 37.
    Parizi SN, Vedaldi A, Zisserman A, Felzenszwalb P (2015) Automatic discovery and optimization of parts for image classification. In: Proceedings of international conference on learning representationsGoogle Scholar
  38. 38.
    Pechyony D, Vapnik V (2010) On the theory of learning with privileged information. In: Proceedings of neural information processing systems, pp 1894–1902Google Scholar
  39. 39.
    Peraldi SE, Kaya A, Melzer S, Möller R, Wessel M (2007) Multimedia interpretation as abduction. In: Proceedings of the dl-2007: international workshop on description logicsGoogle Scholar
  40. 40.
    Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Proceedings of computer vision and pattern recognition, pp 413–420Google Scholar
  41. 41.
    Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of computer vision and pattern recognition workshop, pp 512–519Google Scholar
  42. 42.
    Singh S, Gupta A, Efros A (2012) Unsupervised discovery of mid-level discriminative patches. In: Proceedings of European conference on computer vision, pp 73–86Google Scholar
  43. 43.
    Song X, Jiang S, Herranz L (2015) Joint multi-feature spatial context for scene recognition in the semantic manifold. In: Proceedings of computer vision and pattern recognition, pp 1312–1320Google Scholar
  44. 44.
    Sun J, Ponce J (2013) Learning discriminative part detectors for image classification and cosegmentation. In: Proceedings of international conference on computer vision, pp 3400–3407Google Scholar
  45. 45.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288Google Scholar
  46. 46.
    Torralba A, Murphy KP, Freeman WT (2007) Sharing visual features for multiclass and multiview object detection. IEEE Trans Pattern Anal Mach Intell 29(5):854–869CrossRefGoogle Scholar
  47. 47.
    VanGemert J, Veenman C, Smeulders A, Geusebroek J (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283CrossRefGoogle Scholar
  48. 48.
    Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. In: Proceedings of Multimedia, pp 1469–1472Google Scholar
  49. 49.
    Wang G, Forsyth DA (2009) Joint learning of visual attributes, object classes and visual saliency. In: Proceedings of international conference on computer vision, pp 537–544Google Scholar
  50. 50.
    Wang X, Wang B, Bai X, Liu W, Tu Z (2013) Max-margin multiple-instance dictionary learning. In: Proceedings of the international conference on machine learning, pp 846–854Google Scholar
  51. 51.
    Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Proceedings of computer vision and pattern recognition, pp 3485–3492Google Scholar
  52. 52.
    Yuille AL, Rangarajan A (2003) The concave–convex procedure. Neural Comput 15(4):915–936CrossRefzbMATHGoogle Scholar
  53. 53.
    Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Proceedings of neural information processing systems, pp 487–495Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Peng Tang
    • 1
  • Jin Zhang
    • 1
  • Xinggang Wang
    • 1
  • Bin Feng
    • 1
  • Fabio Roli
    • 2
  • Wenyu Liu
    • 1
  1. 1.School of Electronic Information and CommunicationsHuazhong University of Science and TechnologyWuhanChina
  2. 2.Department of Electrical and Electronic EngineeringUniversity of CagliariCagliariItaly

Personalised recommendations