Abstract
In this paper, we firstly propose an extended probabilistic latent semantic analysis (PLSA) to model continuous quantity. In addition, corresponding EM algorithm is derived to determine the parameters. Then, we apply this model in automatic image annotation. In order to deal with the data of different modalities according to their characteristics, we present a semantic annotation model which employs continuous PLSA and traditional PLSA to model visual features and textual words respectively. These two models are linked with the same distribution over all aspects. Furthermore, an asymmetric learning approach is adopted to estimate the model parameters. This model can predict semantic annotation well for an unseen image because it associates visual and textual modalities more precisely and effectively. We evaluate our approach on the Corel5k and Corel30k dataset. The experiment results show that our approach outperforms several state-of-the-art approaches.
Chapter PDF
Similar content being viewed by others
References
Barnard, K., Duygulu, P., Forsyth, D., et al.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Blei, D.M., Jordan, M.I.: Modeling annotated data. In: Proc. 26th Intl. ACM SIGIR Conf., pp. 127–134 (2003)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. PAMI 29(3), 394–410 (2007)
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys 40(2), article 5, 1–60 (2008)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple Bernoulli relevance models for image and video annotation. In: Proc. CVPR, pp. 1002–1009 (2004)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1-2), 177–196 (2001)
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: Proc. 26th Int’l ACM SIGIR Conf., pp. 119–126 (2003)
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: Proc. NIPS, pp. 553–560 (2003)
Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans. PAMI 25(9), 1075–1088 (2003)
Li, Z., Shi, Z., Liu, X., Li, Z., Shi, Z.: Fusing semantic aspects for image annotation and retrieval. Journal of Visual Communication and Image Representation 21(8), 798–805 (2010)
Li, Z., Shi, Z., Liu, X., Shi, Z.: Automatic image annotation with continuous PLSA. In: Proc. 35th ICASSP, pp. 806–809 (2010)
Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image indexing. IEEE Trans. PAMI 29(10), 1802–1817 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Li, Z., Shi, Z., Tang, Z., Zhao, W. (2012). A Novel Model for Semantic Learning and Retrieval of Images. In: Shi, Z., Leake, D., Vadera, S. (eds) Intelligent Information Processing VI. IIP 2012. IFIP Advances in Information and Communication Technology, vol 385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32891-6_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-32891-6_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32890-9
Online ISBN: 978-3-642-32891-6
eBook Packages: Computer ScienceComputer Science (R0)