Beyond Bag-of-Words: combining generative and discriminative models for scene categorization

Li, Zhen; Yap, Kim-Hui

doi:10.1007/s11042-012-1245-3

Beyond Bag-of-Words: combining generative and discriminative models for scene categorization

Published: 16 October 2012

Volume 71, pages 1033–1050, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhen Li¹ &
Kim-Hui Yap¹

231 Accesses
Explore all metrics

Abstract

This paper proposes an efficient framework for scene categorization by combining generative model and discriminative model. A state-of-the-art approach for scene categorization is the Bag-of-Words (BoW) framework. However, there exist many categories in scenes. Generally when a new category is considered, the codebook in BoW framework needs to be re-generated, which will involve exhaustive computation. In view of this, this paper tries to address the issue by designing a new framework with good scalability. When an additional category is considered, much lower computational cost is needed while the resulting image signatures are still discriminative. The image signatures for training discriminative model are carefully designed based on the generative model. The soft relevance value of the extracted image signatures are estimated by image signature space modeling and are incorporated in Fuzzy Support Vector Machine (FSVM). The effectiveness of the proposed method is validated on UIUC Scene-15 dataset and NTU-25 dataset, and it is shown to outperform other state-of-the-art approaches for scene categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hierarchical Feature Extraction Scheme with Special Vocabulary Generation for Natural Scene Classification

Scene categorization based on local–global feature fusion and multi-scale multi-spatial resolution encoding

Article 08 June 2014

Jianzhao Qin, Fuqin Deng & Nelson H. C. Yung

Scale-space multi-view bag of words for scene categorization

Article 07 September 2020

Davar Giveki

References

Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Bosch A, Zisserman A, Muñoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727
Article Google Scholar
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, European conference on computer vision
Deselaers T, Heigold G, Ney H (2010) Object classification by fusing SVMs and Gaussian mixtures. Pattern Recogn 43(7):2476–2484
Article MATH Google Scholar
Dorko G, Schmid C (2005) Object class recognition using discriminative local features. In: INRIA Technical Report, RR-5497
Fifteen Scene Categories, http://www-cvr.ai.uiuc.edu/ponce_grp/data
Jiang, Y-G, Ngo C-W, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: ACM international conference on image and video retrieval
Li T, Mei T, Kweon I-S, Hua X-S (2011) Contextual bag-of-words for visual categorization. IEEE Trans Circuits Syst Video Technol 21(4):381–392
Article Google Scholar
Li Z, Yap K-H, Chen X (2011) Beyond bags of words: combining generative and discriminative models for natural scene categorization. In: International conference on acoustics, speech and signal processing, pp 965–968
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: European conference on computer vision, pp 490–503
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proc. int. conf. comput. vis., vol 2, pp 1470–1477
Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Article Google Scholar
Swain M, Ballard D (1991) Color indexing. Int J Comput Vis 7(1):11–32
Article Google Scholar
Szummer M, Picard RW (1998) Indoor-outdoor image classification. In: IEEE international workshop on content-based access of image and video database, pp 42–51
van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283
Article Google Scholar
Wu L, Hoi SCH, Yu NH (2010) Semantics-preserving bag-of-words models and applications. IEEE Trans Image Process 19(7):1908
Article MathSciNet Google Scholar
Yu Z, Wong HS (2006) FEMA: A fast expectation maximization algorithm based on grid and PCA. In: IEEE international conference on multimedia & expo, pp 1913–1916
Zhang J, Marszalek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238
Article Google Scholar

Download references

Acknowledgements

This work is supported by Agency for Science, Technology and Research (A*STAR), Singapore under SERC Grant 062 130 0055. Thank Dr. J. C. van Gemert for kindly providing the source code of UNC in [15]. Thank the anonymous reviewers for providing the valuable suggestions that significantly improve the quality of the paper.

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Zhen Li & Kim-Hui Yap

Authors

Zhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Hui Yap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Yap, KH. Beyond Bag-of-Words: combining generative and discriminative models for scene categorization. Multimed Tools Appl 71, 1033–1050 (2014). https://doi.org/10.1007/s11042-012-1245-3

Download citation

Published: 16 October 2012
Issue Date: August 2014
DOI: https://doi.org/10.1007/s11042-012-1245-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond Bag-of-Words: combining generative and discriminative models for scene categorization

Abstract

Access this article

Similar content being viewed by others

A Hierarchical Feature Extraction Scheme with Special Vocabulary Generation for Natural Scene Classification

Scene categorization based on local–global feature fusion and multi-scale multi-spatial resolution encoding

Scale-space multi-view bag of words for scene categorization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Beyond Bag-of-Words: combining generative and discriminative models for scene categorization

Abstract

Access this article

Similar content being viewed by others

A Hierarchical Feature Extraction Scheme with Special Vocabulary Generation for Natural Scene Classification

Scene categorization based on local–global feature fusion and multi-scale multi-spatial resolution encoding

Scale-space multi-view bag of words for scene categorization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation