Multimedia Tools and Applications

, Volume 60, Issue 2, pp 305–326 | Cite as

Segmentation-based multi-class semantic object detection

  • Remi Vieux
  • Jenny Benois-Pineau
  • Jean-Philippe Domenger
  • Achille Braquelaire


In this paper we study the problem of the detection of semantic objects from known categories in images. Unlike existing techniques which operate at the pixel or at a patch level for recognition, we propose to rely on the categorization of image segments. Recent work has highlighted that image segments provide a sound support for visual object class recognition. In this work, we use image segments as primitives to extract robust features and train detection models for a predefined set of categories. Several segmentation algorithms are benchmarked and their performances for segment recognition are compared. We then propose two methods for enhancing the segments classification, one based on the fusion of the classification results obtained with the different segmentations, the other one based on the optimization of the global labelling by correcting local ambiguities between neighbor segments. We use as a benchmark the Microsoft MSRC-21 image database and show that our method competes with the current state-of-the-art.


Object detection Segmentation Relaxation labelling Late fusion SVM 


  1. 1.
    Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: SODA07, pp 1027–1035Google Scholar
  2. 2.
    Athanasiadis T, Mylonas P, Avrithis Y, Kollias S (2007) Semantic image segmentation and object labeling. IEEE Trans Circuits Syst Video Technol 13(3):298–312CrossRefGoogle Scholar
  3. 3.
    Ayache S, Quenot G, Gensel J (2007) Classifier fusion for svm-based multimedia semantic indexing. Lect Notes Comput Sci 4425:494–504CrossRefGoogle Scholar
  4. 4.
    Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vis Image Underst 110:346–359CrossRefGoogle Scholar
  5. 5.
    Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at
  6. 6.
    Chang S-f, He J, Jiang Y-G, El Khoury E, Ngo C-W, Yanagawa A, Zavesky E (2008) CSF Columbia university/vireo-cityu/irit trecvid2008 high-level feature extraction and interactive video search. In: TRECVID’08.
  7. 7.
    Chevalier F, Domenger JP, Benois-Pineau J, Delest M (2007) Retrieval of objects in video by similarity based on graph matching. Pattern Recogn Lett 28:939–949CrossRefGoogle Scholar
  8. 8.
    Christoudias C, Georgescu B, Meer P (2002) Synergism in low level vision. In: 16th International Conference on Pattern Recognition, pp 150–155Google Scholar
  9. 9.
    Comanicu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Machine Intell 24:603–619CrossRefGoogle Scholar
  10. 10.
    Duygulu P, Barnard K, de Freitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV’02, pp 97–112Google Scholar
  11. 11.
    Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59:167–181CrossRefGoogle Scholar
  12. 12.
    Freixenet J, Muoz X, Raba D, Mart J, Cuf X (2002) Yet another survey on image segmentation: region and boundary information integration. In: ECCV’02Google Scholar
  13. 13.
    Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: CVPR’08. Anchorage, AKGoogle Scholar
  14. 14.
    Gokalp D, Aksoy S (2007) Scene classification using bag-of-regions representations. In: CVPR’07, pp 1–8Google Scholar
  15. 15.
    Gould S, Rodgers J, Cohen D, Elidan G, Koller D (2008) Multi-class segmentation with relative location prior. Int J Comput Vis 80(3):300–316CrossRefGoogle Scholar
  16. 16.
    He X, Zemel RS, Carreira-Perpinan M (2004) Multiscale conditional random fields for image labeling. In: CVPR’04, pp 695–702Google Scholar
  17. 17.
    Hoiem D, Efros AA, Hebert M (2005) Geometric context from a single image. In: ICCV’05Google Scholar
  18. 18.
    Jiang YG, Yang J, Ngo C, Hauptmann AG (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans Multimedia 12:42–53CrossRefGoogle Scholar
  19. 19.
    Malisiewicz T, Efros A (2007) Improving spatial support for objects via multiple segmentations. In: British Machine Vision Conference 2007Google Scholar
  20. 20.
    Meer P, Georgescu B (2001) Edge detection with embedded confidence. IEEE Trans Pattern Anal Mach Intell 23(12):1351–1365CrossRefGoogle Scholar
  21. 21.
    Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: ECCV’06Google Scholar
  22. 22.
    Pal N, Pal S (1993) A review on image segmentation. Pattern Recogn 26:1277–1294CrossRefGoogle Scholar
  23. 23.
    Peng Y, Yang Z, Yi J, Cao L, Li H, Yao J (2008) Peking university at trecvid 2008: high level feature extraction. In: TRECVID’08, p (on line). NISTGoogle Scholar
  24. 24.
    Platt J (2000) Probabilistic outputs for support vector machines and comparison to regularize likelihood methods. In: Advances in large margin classifiers, pp 61–74.
  25. 25.
    Prasad L, Skourikhine AN (2006) Vectorized image segmentation via trixel agglomeration. Pattern Recogn 39(4):501–514MATHCrossRefGoogle Scholar
  26. 26.
    Ren X, Malik J (2003) Learning a classification model for segmentation. In: ICCV’03, vol 1, pp 10–17Google Scholar
  27. 27.
    Rosenfeld A, Hummel RA, Zucker SW (1976) Scene labeling by relaxation operations. IEEE Trans Syst Man Cybern 6:420–433MathSciNetMATHCrossRefGoogle Scholar
  28. 28.
    Shotton J, Winn J, Rother C, Criminisi A (2009) Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout and context. Int J Comput Vis 81(1):2–23CrossRefGoogle Scholar
  29. 29.
    Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: ICCV’03, vol 2, pp 1470–1477Google Scholar
  30. 30.
    Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRefGoogle Scholar
  31. 31.
    Vapnik V (1995) The nature of statistical learning theory. SpringerGoogle Scholar
  32. 32.
    Varma M, Zisserman A (2008) A statistical approach to material classification using image patch examplars. IEEE Trans Pattern Anal Mach Intell 31:2032–2047CrossRefGoogle Scholar
  33. 33.
    Verbeek J, Triggs B (2007) Region classification with markov field aspect models. In: CVPR’07, pp 1–8
  34. 34.
    Verbeek J, Triggs B (2007) Region classification with markov field aspect models. In: CVPR’07Google Scholar
  35. 35.
    Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005MathSciNetMATHGoogle Scholar
  36. 36.
    Yang L, Meer P, Foran DJ (2007) Multiple class segmentation using a unified framework over mean-shift patches. In: CVPRGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Remi Vieux
    • 1
  • Jenny Benois-Pineau
    • 1
  • Jean-Philippe Domenger
    • 1
  • Achille Braquelaire
    • 1
  1. 1.LaBRI CNRS UMR 5800Talence CedexFrance

Personalised recommendations