Object Categorization Based on a Supervised Mean Shift Algorithm

  • Ruo Du
  • Qiang Wu
  • Xiangjian He
  • Jie Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7585)


In this work, we present a C++ implementation of object categorization with the bag-of-word (BoW) framework. Unlike typical BoW models which consider the whole area of an image as the region of interest (ROI) for visual codebook generation, our implementation only considers the regions of target objects as ROIs and the unrelated backgrounds will be excluded for generating codebook. This is achieved by a supervised mean shift algorithm. Our work is on the benchmark SIVAL dataset and utilizes a Maximum Margin Supervised Topic Model for classification. The final performance of our work is quite encouraging.


Target Object Visual Word Latent Dirichlet Allocation Object Categorization Feature Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV, pp. 1–22 (2004)Google Scholar
  2. 2.
    Fei-fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)Google Scholar
  3. 3.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)Google Scholar
  4. 4.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B., Rennes, I., Grenoble, I.I., Ljk, L.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  5. 5.
    Ning, H., Xu, W., Gong, Y., Huang, T.: Discriminative learning of visual words for 3d human pose estimation. In: CVPR, pp. 1–8 (2008)Google Scholar
  6. 6.
    Donahue, J., Grauman, K.: Annotator rationales for visual recognition. In: ICCV, pp. 1395–1402 (2011)Google Scholar
  7. 7.
    Zhu, J., Ahmed, A., Xing, E.P.: Medlda: maximum margin supervised topic models for regression and classification. In: ICML, pp. 1257–1264 (2009)Google Scholar
  8. 8.
    Rubin, T.N., Chambers, A., Smyth, P., Steyvers, M.: Statistical topic models for multi-label document classification. Machine Learning, 48–55 (in press, 2012)Google Scholar
  9. 9.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2003)Google Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  11. 11.
    Du, R., Wang, S., Wu, Q., He, X.: Learn concepts in multiple-instance learning with diverse density framework using supervised mean shift. In: DICTA, pp. 643–648 (2010)Google Scholar
  12. 12.
    Wang, C., Zhang, L., Zhang, H.J.: Graph-based multiple-instance learning for object-based image retrieval. In: ACM MIR, pp. 156–163 (2008)Google Scholar
  13. 13.
    Rahmani, R., Goldman, S.A.: Missl: Multiple-instance semi-supervised learning. In: ICML, pp. 705–712 (2006)Google Scholar
  14. 14.
    Wang, J.Z., Li, J., Wiederhold, G.: Simplicity: Semantics-sensitive integrated matching for picture libraries. TPAMI 23, 947–963 (2001)CrossRefGoogle Scholar
  15. 15.
    Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: ICML, pp. 341–349 (1998)Google Scholar
  16. 16.
    Rahmani, R., Goldman, S.A., Zhang, H., Krettek, J., Fritts, J.E.: Localized content based image retrieval. In: ACM SIGMM International Workshop on Multimedia Information Retrieval, MIR 2005, pp. 227–236 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ruo Du
    • 1
  • Qiang Wu
    • 1
  • Xiangjian He
    • 1
  • Jie Yang
    • 2
  1. 1.University of TechnologySydneyAustralia
  2. 2.Shanghai Jiaotong UniversityChina

Personalised recommendations