Abstract
Latent semantic models (e.g. PLSA and LDA) have been successfully used in document analysis. In recent years, many of the latent semantic models have also been proved to be promising for visual content analysis tasks, such as image clustering and classification. The topics and words which are two of the key components in latent semantic models have explicit semantic meaning in document analysis. However, these topics and words are difficult to be described or represented in visual content analysis tasks, which usually leads to failure in practice. In this paper, we consider simultaneously the topic consistency and word consistency in semantic space to adapt the traditional PLSA model to the visual content analysis tasks. In our model, the ℓ 1-graph is constructed to model the local neighborhood structure of images in feature space and the word co-occurrence is computed to capture the local word consistency. Then, the local information is incorporated into the model for topic discovering. Finally, the generalized EM algorithm is used to estimate the parameters. Extensive experiments on publicly available databases demonstrate the effectiveness of our approach.
Similar content being viewed by others
References
Belkin M, Niyogi P (2002) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bosch A, Zisserman A, Munoz X (2006) Scene classification via PLSA. In: Proceedings of ECCV. pp 517–530
Cai D, Mei Q, Han J, Zhai C (2008) Modeling hidden topics on document manifold. In: Proceedings of CIKM. pp 911–920
Cai D, Wang X, He X (2009) Probabilistic dyadic data analysis with local and global consistency. In: Proceedings of ICML
Cao L, Li F (2007) Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In: Proceedings of ICCV. pp 1–8
Cheng B, Yang J, Yan S, Fu Y, Huang T (2010) Learning with .1-graph for image analysis. IEEE Trans Image Process 19:858–866
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from in complete data via the EM algorithm. J Royal Stat Soc 39:1–38
He X, Niyogi P (2003) Locality preserving projections. In: Proceedings of NIPS
He X, Cai D, Yan S, Zhang H (2005) Neighborhood preserving embedding. In: Proceedings of ICCV. pp 1208–1213
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42:177–196
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of CVPR. pp 2169–2178
Li F, Rob F, Pietro P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Proceedings of CVPR workshop on generative model based vision
Li P, Cheng J, Lu H (2012) Modeling hidden topics with dual local consistency for image analysis. In: Proceedings of ACCV. pp 648–659
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Meinshansen N, Buhlmann P (2006) High-dimensional graphs and variable selection with the lasso. Annals Stat 34:1436–1462
Monay F, Gatica-Perez D (2004) PLSA-based image auto-annotation: constraining the latent space. In: Proceedings of ACM multimedia. pp 348–351
Neal R, Hinton G (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. Learn Graph Models
Parikh D, Grauman K (2011) Relative attributes. In: Proceedings of IEEE international conference on computer vision
Press W, Flannery B, Teukolsky S, Vetterling W (1992) Numerical recipes in C: the art of scientific computing. Cambridge University Press
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Tenenbaum J (1997) Mapping a manifold of perceptual observations. In: Proceedings of NIPS. pp 682–688
Tenenbaum J, Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Wright J, Genesh A, Yang A, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31:210–227
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of SIGIR. pp 267–273
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cheng, J., Li, P., Rui, T. et al. Learning latent semantic model with visual consistency for image analysis. Multimed Tools Appl 74, 1341–1356 (2015). https://doi.org/10.1007/s11042-014-1916-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-1916-3