Multimedia Tools and Applications

, Volume 56, Issue 3, pp 535–552 | Cite as

Incremental visual objects clustering with the growing vocabulary tree

  • Zhenyong FuEmail author
  • Hongtao Lu
  • Wenbin Li


With the bag-of-visual-words image representation, we can use the text analysis methods, such as pLSA and LDA, to solve the visual objects clustering and classification problems. However the previous works only used a fixed visual vocabulary, which is formed by vector quantizing SIFT like region descriptors, and so the learned visual topic models are also only based on the fixed vocabulary. This paper presents a novel approach to cluster visual objects in an incremental manner. Given a new batch of images, we firstly expand the visual vocabulary to include the new visual words, and then adjust the objects clustering model to absorb these new words, and finally give the clustering result. We achieve our goal by adapting to the visual domain of the incremental pLSA model previously used for text analysis. Experimental results demonstrate the feasibility and stability of the growing vocabulary tree and the clustering performance using the images from seven categories in a dynamic environment.


Visual clustering Bag-of-words Incremental pLSA 



This work was supported by the National High Technology Research and Development Program of China (No. 2008AA02Z310), Shanghai Committee of Science and Technology (No. 08411951200, No. 08JG05002), 973 (2009CB320901) and NLPR (09-4-1).


  1. 1.
    Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  2. 2.
    Cai D, He X, Li Z, Ma WY, Wen JR (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. In: ACM multimediaGoogle Scholar
  3. 3.
    Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proc. ACM SIGKDDGoogle Scholar
  4. 4.
    Chou TC, Chen MC (2008) Using incremental plsa for threshold resilient online event anlysis. IEEE Trans Knowl Data Eng 20:289–299CrossRefGoogle Scholar
  5. 5.
    Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proc. CVPRGoogle Scholar
  6. 6.
    Gao B, Liu TY, Qin T, Zheng X, Cheng QS, Ma WY (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: ACM multimediaGoogle Scholar
  7. 7.
    Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proc. ICCVGoogle Scholar
  8. 8.
    Hofmann T (1999) Probabilistic latent semantic indexing. In: Proc. SIGIRGoogle Scholar
  9. 9.
    Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 43:177–196CrossRefGoogle Scholar
  10. 10.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. CVPRGoogle Scholar
  11. 11.
    Lepetit V, Fua P (2006) Keypoint recognition using randomized trees. In: PAMI, pp 1465–1479Google Scholar
  12. 12.
    Li L, Wang G, Fei-Fei L (2007) Optimol: automatic online picture collection via incremental model learning. In: Proc. CVPRGoogle Scholar
  13. 13.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. IJCV 60:91–110CrossRefGoogle Scholar
  14. 14.
    Matas J, Chum O, Martin U, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: Proc. BMVC, vol 1, pp 384–393Google Scholar
  15. 15.
    Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. IJCV 60:63–86CrossRefGoogle Scholar
  16. 16.
    Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. PAMI 27:1615–1630CrossRefGoogle Scholar
  17. 17.
    Moosmann F, Nowak E, Jurie F (2008) Randomized clustering forests for image classification. PAMI 9:1632–1646CrossRefGoogle Scholar
  18. 18.
    Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: Proc. CVPRGoogle Scholar
  19. 19.
    Reddy KK, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: ICCVGoogle Scholar
  20. 20.
    Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering objects and their location in images. In: Proc. ICCV, pp 370–377Google Scholar
  21. 21.
    Slobodan I (2008) Object labeling for recognition using vocabulary trees. In: ICPRGoogle Scholar
  22. 22.
    Yeh T, Darrell T (2008) Dynamic visual category learning. In: CVPRGoogle Scholar
  23. 23.
    Yeh T, Lee J, Darrell T (2007) Adaptive vocabulary forests for dynamic indexing and category learning. In: Proc. ICCVGoogle Scholar
  24. 24.
    Zheng X, Cai D, He X, Ma WY, Lin X (2004) Locality preserving clustering for image database. In: ACM multimediaGoogle Scholar
  25. 25.

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.Department of Diagnostic and Interventional Radiology, Affiliated Sixth People’s HospitalShanghai Jiao Tong UniversityShanghaiChina

Personalised recommendations