Multimedia Tools and Applications

, Volume 74, Issue 2, pp 543–559 | Cite as

Boosted MIML method for weakly-supervised image semantic segmentation



Weakly-supervised image semantic segmentation aims to segment images into semantically consistent regions with only image-level labels are available, and is of great significance for fine-grained image analysis, retrieval and other possible applications. In this paper, we propose a Boosted Multi-Instance Multi-Label (BMIML) learning method to address this problem, the approach is built upon the following principles. We formulate the image semantic segmentation task as a MIML problem under the boosting framework, where the goal is to simultaneously split the superpixels obtained from over-segmented images into groups and train one classifier for each group. In the method, a loss function which uses the image-level labels as weakly-supervised constraints, is employed to suitable semantic labels to these classifiers. At the same time a contextual loss term is also combined to reduce the ambiguities existing in the training data. In each boosting round, we introduce an “objectness” measure to jointly reweigh the instances, in order to overcome the disturbance from highly frequent background superpixels. We demonstrate that BMIML outperforms the state-of-the-arts for weakly-supervised semantic segmentation on two widely used datasets, i.e., MSRC and LabelMe.


MIML Weakly-supervised Semantic segmentation Objectness 



This work was supported by 973 Program (2010CB327905), National Natural Science Foundation of China (61272329, 61070104, 61202325) and Open Projects Program of National Laboratory of Pattern Recognition.


  1. 1.
    Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Ssstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE TPAMI 22(8):888–905Google Scholar
  2. 2.
    Alexe B, Deselaers T, Ferrari V (2010) What is an object? In: CVPRGoogle Scholar
  3. 3.
    Arbelaez P, Hariharan B, Gu C, Gupta S, Bourdev L, Malik J (2012) Semantic segmentation using regions and parts. In: CVPRGoogle Scholar
  4. 4.
    Babenko B, Dollar P, Tu Z, Belongie S (2008) Simultaneous learning and alignment: Multi-instance and multi-pose learning. In: ECCV WorkshopGoogle Scholar
  5. 5.
    Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232CrossRefMATHGoogle Scholar
  6. 6.
    Fulkerson B, Vedaldi A, Soatto S (2009) Class segmentation and object localization with superpixel neighborhoods. In: ICCVGoogle Scholar
  7. 7.
    Han Y, Wu F, Shao J, Tian Q, Zhuang Y (2012) Graph-guided sparse reconstruction for region tagging. In: CVPRGoogle Scholar
  8. 8.
    jia Li L, Socher R, Fei-fei L (2009) Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: CVPRGoogle Scholar
  9. 9.
    Liu D, Yan S, Rui Y, Zhang HJ (2010) Unified tag analysis with multi-edge graph. In: ACM MMGoogle Scholar
  10. 10.
    Liu X, Cheng B, Yan S, Tang J, Chua TS, Jin H (2009) Label to region by bi-layer sparsity priors. In: ACM MMGoogle Scholar
  11. 11.
    Liu X, Yan S, Luo J, Tang J, Huango Z, Jin H (2010) Nonparametric label-to-region by search. In: CVPRGoogle Scholar
  12. 12.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. IJCV 60:91–110CrossRefGoogle Scholar
  13. 13.
    Mason L, Baxter J, Bartlett P, Frean M (1999) Boosting algorithms as gradient descent in function space. NIPSGoogle Scholar
  14. 14.
    Viola PA, Platt J, Zhang C (2005) Multiple instance boosting for object detection. In: NIPSGoogle Scholar
  15. 15.
    Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: ICCVGoogle Scholar
  16. 16.
    Russell C, Torr PHS, Kohli P (2009) Associative hierarchical crfs for object class image segmentation. In: ICCVGoogle Scholar
  17. 17.
    Shotton J, Winn J, Rother C, Criminisi A (2009) Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV 81:2–23CrossRefGoogle Scholar
  18. 18.
    Socher R, Fei-fei L (2010) Connecting modalities: semi-supervised segmentation and annotation of images using unaligned text corpora. In: CVPRGoogle Scholar
  19. 19.
    Tighe J, Lazebnik S (2010) Superparsing: scalable nonparametric image parsing with superpixels. In: ECCVGoogle Scholar
  20. 20.
    Vezhnevets A, Ferrari V, Buhmann J (2011) Weakly supervised semantic segmentation with a multi-image model. In: ICCVGoogle Scholar
  21. 21.
    Vezhnevets A, Ferrari V, Buhmann JM (2012) Weakly supervised structured output learning for semantic segmentation. In: CVPRGoogle Scholar
  22. 22.
    Yang Y, Yang Y, Huang Z, Shen HT, Nie F (2011) Tag localization with spatial correlations and joint group sparsity. In: CVPRGoogle Scholar
  23. 23.
    Yao J, Fidler S, Urtasun R (2012) Describing the scene as a whole: joint object detection, scene classification and semantic segmentation. In: CVPRGoogle Scholar
  24. 24.
    Zha ZJ, Hua XS, Mei T, Wang J, Qi GJ, Wang Z (2008) Joint multi-label multi-instance learning for image classification. In: CVPRGoogle Scholar
  25. 25.
    Zhang ML, Zhou ZH (2008) M3miml: a maximum margin method for multi-instance multi-label learning. In: ICDMGoogle Scholar
  26. 26.
    Zhou z (2006) Multi-instance multi-label learning with application to scene classificationGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.National Laboratory of Pattern RecognitionInstitution of Automation Chinese Academy of SciencesBeijingChina
  2. 2.School of Computer ScienceNanjing University of Science and TechnologyNanjingChina

Personalised recommendations