Advertisement

Supervised Label Transfer for Semantic Segmentation of Street Scenes

  • Honghui Zhang
  • Jianxiong Xiao
  • Long Quan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6315)

Abstract

In this paper, we propose a robust supervised label transfer method for the semantic segmentation of street scenes. Given an input image of street scene, we first find multiple image sets from the training database consisting of images with annotation, each of which can cover all semantic categories in the input image. Then, we establish dense correspondence between the input image and each found image sets with a proposed KNN-MRF matching scheme. It is followed by a matching correspondences classification that tries to reduce the number of semantically incorrect correspondences with trained matching correspondences classification models for different categories. With those matching correspondences classified as semantically correct correspondences, we infer the confidence values of each super pixel belonging to different semantic categories, and integrate them and spatial smoothness constraint in a markov random field to segment the input image. Experiments on three datasets show our method outperforms the traditional learning based methods and the previous nonparametric label transfer method, for the semantic segmentation of street scenes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bileschi, S.: StreetScenes: Towards Scene Understanding in Still Images. PhD thesis, Massachusetts Institute of Technology (2006)Google Scholar
  2. 2.
    Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: Large scale scene recognition from abbey to zoo. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  3. 3.
    Xiao, J., Fang, T., Zhao, P., Lhuillier, M., Quan, L.: Image-based street-side city modeling. ACM Transactions on Graphics 28, 114:1–114:12 (2009)CrossRefGoogle Scholar
  4. 4.
    Zhao, P., Fang, T., Xiao, J., Zhang, H., Zhao, Q., Quan, L.: Rectilinear parsing of architecture in urban environment. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  5. 5.
    Xiao, J., Fang, T., Tan, P., Zhao, P., Ofek, E., Quan, L.: Image-based façade modeling. ACM Transactions on Graphics 27, 161:1–161:10 (2008)CrossRefGoogle Scholar
  6. 6.
    Tan, P., Fang, T., Xiao, J., Zhao, P., Quan, L.: Single image tree modeling. ACM Transactions on Graphics 27, 108:1–108:7 (2008)CrossRefGoogle Scholar
  7. 7.
    He, X., Zemel, R., Carreira-Perpinan, M.: Multiscale conditional random fields for image labeling. In: IEEE Conference on Computer Vision and Pattern Recognition (2004)Google Scholar
  8. 8.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  9. 9.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost for image understanding: Multi-Class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision 81, 2–23 (2009)CrossRefGoogle Scholar
  10. 10.
    Xiao, J., Quan, L.: Multiple view semantic segmentation for street view images. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  11. 11.
    Brostow, G., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Torralba, A., Fergus, R., Freeman, W.: 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  13. 13.
    Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  14. 14.
    Russell, B., Efros, A., Sivic, J., Freeman, W., Zisserman, A.: Segmenting scenes by matching image composites. In: Advances in Neural Information Processing Systems (2009)Google Scholar
  15. 15.
    Oliva, A., Torralba, A.: Modeling the shape of the scene:a holistic representation of the spatial envelope. International Journal of Computer Vision 42, 145–175 (2001)zbMATHCrossRefGoogle Scholar
  16. 16.
    Bengoetxea, E.: Inexact Graph Matching Using Estimation of Distribution Algorithms. PhD thesis, Ecole Nationale Supérieure des Télécommunications (2002)Google Scholar
  17. 17.
    Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 65–81. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  18. 18.
    Levinshtein, A., Stere, A., Kutulakos, K., Fleet, D., Dickinson, S., Siddiqi, K.: Turbopixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2290–2297 (2009)CrossRefGoogle Scholar
  19. 19.
    Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: Advances in Neural Information Processing Systems (2006)Google Scholar
  20. 20.
    Russell, B., Torralba, A., Liu, C., Fergus, R., Freeman, W.: Object recognition by scene alignment. In: Object Recognition by Scene Alignment. Advances in Neural Information Processing Systems (2007)Google Scholar
  21. 21.
    Bileschi, S.: CBCL streetscenes challenge framework (2007), http://cbcl.mit.edu/software-datasets/streetscenes/
  22. 22.
    Brostow, G., Fauqueur, J., Cipolla, R.: Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters 30(2), 88–97 (2009)CrossRefGoogle Scholar
  23. 23.
    Micusik, B., Kosecka, J.: Semantic segmentation of street scenes by superpixel co-occurrence and 3d geometry. In: IEEE Workshop on Video-Oriented Object and Event Classification (VOEC) (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Honghui Zhang
    • 1
  • Jianxiong Xiao
    • 2
  • Long Quan
    • 1
  1. 1.The Hong Kong University of Science and Technology 
  2. 2.Massachusetts Institute of Technology 

Personalised recommendations