Advertisement

Multimedia Tools and Applications

, Volume 56, Issue 3, pp 509–534 | Cite as

A non-parametric visual-sense model of images—extending the cluster hypothesis beyond text

  • Kong-Wah WanEmail author
  • Ah-Hwee Tan
  • Joo-Hwee Lim
  • Liang-Tien Chia
Article
  • 133 Downloads

Abstract

The main challenge of a search engine is to find information that are relevant and appropriate. However, this can become difficult when queries are issued using ambiguous words. Rijsbergen first hypothesized a clustering approach for web pages wherein closely associated pages are treated as a semantic group with the same relevance to the query (Rijsbergen 1979). In this paper, we extend Rijsbergen’s cluster hypothesis to multimedia content such as images. Given a user query, the polysemy in the return image set is related to the many possible meanings of the query. We develop a method to cluster the polysemous images into their semantic categories. The resulting clusters can be seen as the visual senses of the query, which collectively embody the visual interpretations of the query. At the heart of our method is a non-parametric Bayesian approach that exploits the complementary text and visual information of images for semantic clustering. Latent structures of polysemous images are mined using the Hierarchical Dirichlet Process (HDP). HDP is a non-parametric Bayesian model that represents images using a mixture of components. The main advantage of our model is that the number of mixture components is not fixed a priori, but is determined during the posterior inference process. This allows our model to grow with the level of polysemy (and visual diversity) of images. The same set of components is used to model all images, with only the mixture weights varying amongst images. Evaluation results on a large collection of web images show the efficacy of our approach.

Keywords

Hierarchical Dirichlet Process Non-parametric models Image clustering Sense disambiguation 

Notes

Acknowledgements

The authors would like to thank Sujoy Roy, Yap Ghim Eng, Sim Tze Jan, Sim Khe Chai and Wang Yue for valuable discussions, and three student helpers for their labeling effort.

References

  1. 1.
    Agrawal R, Gollapudi S, Halverson A, Ieong S (2009) Diversifying search results. In: Proc of the second ACM international conference on web search and data mining, pp 5–14Google Scholar
  2. 2.
    Ali K, Stam V (2004) TiVo: making show recommendations using a distributed collaborative filtering architecture. In: Proc ACM international conference on knowledge discovery and data mining, pp 394–401Google Scholar
  3. 3.
    Arni T, Clough P, Sanderson M, Grubinger M (2008) Overview of the ImageCLEFphoto 2008 photographic retrieval task. In: Working notes of the 2008 CLEF workshopGoogle Scholar
  4. 4.
    Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  5. 5.
    Blei D, Griffiths T, Jordan M (2010) The nested chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J ACM 57(2):1–30MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bradley P, Fayyad U (1998) Refining initial points for k-means clustering. In: Proc international conference on machine learning, pp 91–99Google Scholar
  7. 7.
    Cai D, He X, Li Z, Ma W, Wen J (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proc multimedia, pp 952–959Google Scholar
  8. 8.
    Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proc ACM SIGIR conference on research and development in information retrieval, pp 335–336Google Scholar
  9. 9.
    Cilibrasi R, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383CrossRefGoogle Scholar
  10. 10.
    Clarke C, Kolla M, Cormack G, Vechtomova O, Ashkan A, Buttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation. In: Proc ACM SIGIR conference on research and development in information retrieval, pp 659–666CrossRefGoogle Scholar
  11. 11.
    Cutting D, Karger D, Pedersen J, Tukey J (1992) Scatter/gather: a cluster-based approach to browsing large document collections. In: Proc ACM SIGIR conference on research and development in information retrievalGoogle Scholar
  12. 12.
    Fergus R, Li F, Perona P, Zisserman A (2005) Learning object categories from googles image search. In: Proc international conference on computer visionGoogle Scholar
  13. 13.
    Ferguson T (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Grauman K, Darrell T (2006) Unsupervised learning of categories from sets of partially matching image features. In: Proc computer vision and pattern recognitionGoogle Scholar
  15. 15.
    Hoffman M, Blei D, Cook P (2008) Content-based musical similarity computation using the hierarchical dirichlet process. In: Proc international conference on music information retrievalGoogle Scholar
  16. 16.
    Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: Proc international conference on computer vision, vol 1, pp 604–610Google Scholar
  17. 17.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc conference on computer vision and pattern recognition, pp 2169–2178Google Scholar
  18. 18.
    Leuken V, Reinier H, Garcia L, Olivares X, Roelof V (2009) Visual diversification of image search results. In: Proc of the 18th international conference on world wide web, pp 341–350Google Scholar
  19. 19.
    Li L, Wang G, Li F (2007) Optimol: automatic object picture collection via incremental model learning. In: Proc computer vision and pattern recognitionGoogle Scholar
  20. 20.
    Li H, Tang J, Li G, Chua T (2008) Word2image: towards visual interpreting of words. In: Proc ACM international conference on multimedia, pp 813–816Google Scholar
  21. 21.
    Loeff N, Alm C, Forsyth D (2006) Discriminating image senses by clustering with multimodal features. In: Proc COLING/ACL, pp 547–554Google Scholar
  22. 22.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  23. 23.
    Matas J, Chum O, Urba M, Pajdla T (2002) Robust wide baseline extremal regions. In: Proc British machine vision conference, pp 384–396Google Scholar
  24. 24.
    Mihalcea R (2007) Using Wikipedia for automatic word sense disambiguation. In: Proc the annual conference of the North American Chapter of the Association for Computational LinguisticsGoogle Scholar
  25. 25.
    Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, vol 14, pp 849–856Google Scholar
  26. 26.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proc computer vision and pattern recognitionGoogle Scholar
  27. 27.
    Quelhas P, Monay F, Odobez J, Gatica-Perez D, Tuytelaars T, Gool LV (2005) Modeling scenes with local descriptors and latent aspects. In: Proc international conference on computer visionGoogle Scholar
  28. 28.
    Rasmussen C (2000) The infinite gaussian mixture model. In: Neural information processing systemsGoogle Scholar
  29. 29.
    Rijsbergen C (1979) Information retrieval. University of GlasgowGoogle Scholar
  30. 30.
    Saenko K, Darrell T (2008) Unsupervised learning of visual sense models for polysemous words. In: Proc neural information processing systemsGoogle Scholar
  31. 31.
    Schroff F, Criminisi A, Zisserman A (2007) Harvesting image databases from the web. In: Proc international conference on computer visionGoogle Scholar
  32. 32.
    Sivic J, Russell B, Zisserman A, Freeman W, Efros A (2008) Unsupervised discovery of visual object class hierarchies. In: Proc computer vision and pattern recognitionGoogle Scholar
  33. 33.
    Song K, Tian Y, Gao W, Huang T (2006) Diversifying the image retrieval results. In: Proc multimedia, pp 707–710Google Scholar
  34. 34.
    Teh Y, Jordon M, Beal M, Blei D (2007) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1556–1581Google Scholar
  35. 35.
    Turney P (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Proc association of computational linguistics, pp 417–424Google Scholar
  36. 36.
    Vedaldi A, Fulkerson B (2008) VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/. Accessed Sep 2009
  37. 37.
    Vivisimo (2009) Vivisimo web clustering. http://vivisimo.com/. Accessed Jan 2010
  38. 38.
    Wan K, Tan A, Lim J, Chia L (2009) A latent model for visual disambiguation of keyword-based image search. In: Proc british machine vision conferenceGoogle Scholar
  39. 39.
    Wan K, Tan A, Lim J, Chia L (2010) Faceted topic retrieval of news video using joint topic modeling of visual features and speech transcripts. In: Proc international conference on multimedia and expoGoogle Scholar
  40. 40.
    Wang S, Jing F, He J, Du Q, Zhang L (2007) Igroup: presenting web image search results in semantic clusters. In: Proc of the SIGCHI conference on Human factors in computing systems, pp 587–596Google Scholar
  41. 41.
    Wei X, Croft W (2006) LDA-based document models for ad-hoc retrieval. In: Proc ACM SIGIR conference on research and development in information retrieval, pp 178–185Google Scholar
  42. 42.
    Wikipedia (2010) English dumps in SQL and XML. http://download.wikimedia.org/enwiki/20100116/. Accessed Feb 2010
  43. 43.
    Xing E, Sohn K, Jordan M, Teh Y (2006) Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture. In: Proc international conference on machine learningGoogle Scholar
  44. 44.
    Zeng H, He Q, Chen Z, Ma W, Ma J (2004) Learning to cluster web search results. In: Proc ACM SIGIR conference on research and development in information retrievalGoogle Scholar
  45. 45.
    Zhai C, Cohen W, Lafferty J (2003) Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proc ACM SIGIR conference on research and development in information retrieval, pp 10–17Google Scholar
  46. 46.
    Ziegler C, Mcnee S, Konstan J, Lausen G (2005) Improving recommendation lists through topic diversification. In: Proc international conference on world wide web, pp 22–32Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Kong-Wah Wan
    • 1
    Email author
  • Ah-Hwee Tan
    • 2
  • Joo-Hwee Lim
    • 1
  • Liang-Tien Chia
    • 2
  1. 1.Institute for Infocomm ResearchSingaporeSingapore
  2. 2.School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations