Learning extremely shared middle-level image representation for scene classification

Tang, Peng; Zhang, Jin; Wang, Xinggang; Feng, Bin; Roli, Fabio; Liu, Wenyu

doi:10.1007/s10115-016-1015-z

Learning extremely shared middle-level image representation for scene classification

Regular Paper
Published: 23 December 2016

Volume 52, pages 509–530, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Peng Tang¹,
Jin Zhang¹,
Xinggang Wang¹,
Bin Feng¹,
Fabio Roli² &
…
Wenyu Liu¹

731 Accesses
4 Citations
Explore all metrics

Abstract

Learning middle-level image representations is very important for the computer vision community, especially for scene classification tasks. Middle-level image representations currently available are not sparse enough to make training and testing times compatible with the increasing number of classes that users want to recognize. In this work, we propose a middle-level image representation based on the pattern that extremely shared among different classes to reduce both training and test time. The proposed learning algorithm first finds some class-specified patterns and then utilizes the lasso regularization to select the most discriminative patterns shared among different classes. The experimental results on some widely used scene classification benchmarks (15 Scenes, MIT-indoor 67, SUN 397) show that the fewest patterns are necessary to achieve very remarkable performance with reduced computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Discriminative and Shareable Features for Scene Classification

Learning Discriminative Mid-Level Patches for Fast Scene Classification

Learning a Discriminative Dictionary with CNN for Image Classification

Notes

The “words”, “parts” and “patterns” are interchangeable and this paper chooses “patterns” to represent them.
15 Scenes: http://www-cvr.ai.uiuc.edu/ponce_grp/data/scene_categories/. MIT-indoor 67: http://web.mit.edu/torralba/www/indoor.html. SUN 397: http://vision.princeton.edu/projects/2010/SUN/.
The implementation code and trained models are available at https://github.com/hust-tp/ESMIR.

References

Argyriou A, Evgeniou T, Pontil M (2006) Multi-task feature learning. In: Proceedings of neural information processing systems, pp 41–48
Bourdev L, Malik J (2009) Poselets: body part detectors trained using 3d human pose annotations. In: Proceedings of international conference on computer vision, pp 1365–1372
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the British machine vision conference
Cimpoi M, Maji S, Vedaldi A (2015) Deep filter banks for texture recognition and segmentation. In: Proceedings of computer vision and pattern recognition, pp 3828–3836
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
MATH Google Scholar
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Proceedings of workshop on statistical learning in computer vision, European conference on computer vision, pp 1–22
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of computer vision and pattern recognition, pp 886–893
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of computer vision and pattern recognition, pp 248–255
Dixit M, Chen S, Gao D, Rasiwasia N, Vasconcelos N (2015) Scene classification with semantic fisher vectors. In: Proceedings of computer vision and pattern recognition, pp 2974–2983
Doersch C, Gupta A, Efros AA (2013) Mid-level visual element discovery as discriminative mode seeking. In: Proceedings of neural information processing systems, pp 494–502
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
MATH Google Scholar
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: Proceedings of computer vision and pattern recognition, pp 1778–1785
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: Proceedings of European conference on computer vision, pp 392–407
Hwang SJ, Sha F, Grauman K (2011) Sharing features between objects and their attributes. In: Proceedings of computer vision and pattern recognition, pp 1761–1768
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM international conference on multimedia, pp 675–678
Juneja M, Vedaldi A, Jawahar CV, Zisserman A (2013) Blocks that shout: Distinctive parts for scene classification. In: Proceedings of computer vision and pattern recognition, pp 923–930
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of neural information processing systems, pp 1097–1105
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of computer vision and pattern recognition, pp 2169–2178
Li LJ, Su H, Fei-Fei L, Xing EP (2010) Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Proceedings of neural information processing systems, pp 1378–1386
Li Q, Wu J, Tu Z (2013) Harvesting mid-level visual concepts from large-scale internet images. In: Proceedings of computer vision and pattern recognition, pp 851–858
Li P, Lu X, Wang Q (2015a) From dictionary of visual words to subspaces: locality-constrained affine subspace coding. In: Proceedings of computer vision and pattern recognition, pp 2348–2357
Li Y, Liu L, Shen C, van den Hengel A (2015b) Mid-level deep pattern mining. In: Proceedings of computer vision and pattern recognition, pp 971–980
Liu L, Wang L, Liu X (2011) In defense of soft-assignment coding. In: Proceedings of international conference on computer vision, pp 2486–2493
Liu L, Shen C, Wang L, van den Hengel A, Wang C (2014) Encoding high dimensional local features by sparse coding based fisher vectors. In: Proceedings of neural information processing systems, pp 1143–1151
Liu L, Shen C, van den Hengel A (2015) The treasure beneath convolutional layers: cross-convolutional-layer pooling for image classification. In: Proceedings of computer vision and pattern recognition, pp 4749–4757
Lobel H, Vidal R, Soto A (2013) Hierarchical joint max-margin learning of mid and top level representations for visual recognition. In: Proceedings of international conference on computer vision, pp 1697–1704
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Neumann B, Möller R (2008) On scene interpretation with description logics. Image Vis Comput 26(1):82–101
Article Google Scholar
NG AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of international conference on machine learning
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of computer vision and pattern recognition, pp 1717–1724
Ortega JM, Rheinboldt WC (1970) Iterative solution of nonlinear equations in several variables. Academic Press, New York
MATH Google Scholar
Ott P, Everingham M (2011) Shared parts for deformable part-based models. In: Proceedings of computer vision and pattern recognition, pp 1513–1520
Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. In: Proceedings of international conference on computer vision, pp 1307–1314
Parameswaran S, Weinberger KQ (2010) Large margin multi-task metric learning. In: Proceedings of neural information processing systems, pp. 1867–1875
Parikh D, Grauman K (2011) Relative attributes. In: Proceedings of international conference on computer vision, pp 503–510
Parizi SN, Vedaldi A, Zisserman A, Felzenszwalb P (2015) Automatic discovery and optimization of parts for image classification. In: Proceedings of international conference on learning representations
Pechyony D, Vapnik V (2010) On the theory of learning with privileged information. In: Proceedings of neural information processing systems, pp 1894–1902
Peraldi SE, Kaya A, Melzer S, Möller R, Wessel M (2007) Multimedia interpretation as abduction. In: Proceedings of the dl-2007: international workshop on description logics
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Proceedings of computer vision and pattern recognition, pp 413–420
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of computer vision and pattern recognition workshop, pp 512–519
Singh S, Gupta A, Efros A (2012) Unsupervised discovery of mid-level discriminative patches. In: Proceedings of European conference on computer vision, pp 73–86
Song X, Jiang S, Herranz L (2015) Joint multi-feature spatial context for scene recognition in the semantic manifold. In: Proceedings of computer vision and pattern recognition, pp 1312–1320
Sun J, Ponce J (2013) Learning discriminative part detectors for image classification and cosegmentation. In: Proceedings of international conference on computer vision, pp 3400–3407
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Torralba A, Murphy KP, Freeman WT (2007) Sharing visual features for multiclass and multiview object detection. IEEE Trans Pattern Anal Mach Intell 29(5):854–869
Article Google Scholar
VanGemert J, Veenman C, Smeulders A, Geusebroek J (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283
Article Google Scholar
Vedaldi A, Fulkerson B (2010) Vlfeat: an open and portable library of computer vision algorithms. In: Proceedings of Multimedia, pp 1469–1472
Wang G, Forsyth DA (2009) Joint learning of visual attributes, object classes and visual saliency. In: Proceedings of international conference on computer vision, pp 537–544
Wang X, Wang B, Bai X, Liu W, Tu Z (2013) Max-margin multiple-instance dictionary learning. In: Proceedings of the international conference on machine learning, pp 846–854
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Proceedings of computer vision and pattern recognition, pp 3485–3492
Yuille AL, Rangarajan A (2003) The concave–convex procedure. Neural Comput 15(4):915–936
Article MATH Google Scholar
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Proceedings of neural information processing systems, pp 487–495

Download references

Acknowledgements

We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Natural Science Foundation of China under Grant 61572207 and Grant 61503145, and the CAST Young Talent Supporting Program.

Author information

Authors and Affiliations

School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430074, China
Peng Tang, Jin Zhang, Xinggang Wang, Bin Feng & Wenyu Liu
Department of Electrical and Electronic Engineering, University of Cagliari, 09123, Cagliari, Italy
Fabio Roli

Authors

Peng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinggang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Feng
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Roli
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinggang Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, P., Zhang, J., Wang, X. et al. Learning extremely shared middle-level image representation for scene classification. Knowl Inf Syst 52, 509–530 (2017). https://doi.org/10.1007/s10115-016-1015-z

Download citation

Received: 23 May 2016
Revised: 17 August 2016
Accepted: 03 December 2016
Published: 23 December 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s10115-016-1015-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning extremely shared middle-level image representation for scene classification

Abstract

Access this article

Similar content being viewed by others

Learning Discriminative and Shareable Features for Scene Classification

Learning Discriminative Mid-Level Patches for Fast Scene Classification

Learning a Discriminative Dictionary with CNN for Image Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning extremely shared middle-level image representation for scene classification

Abstract

Access this article

Similar content being viewed by others

Learning Discriminative and Shareable Features for Scene Classification

Learning Discriminative Mid-Level Patches for Fast Scene Classification

Learning a Discriminative Dictionary with CNN for Image Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation