Road Scene Segmentation from a Single Image

  • Jose M. Alvarez
  • Theo Gevers
  • Yann LeCun
  • Antonio M. Lopez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7578)


Road scene segmentation is important in computer vision for different applications such as autonomous driving and pedestrian detection. Recovering the 3D structure of road scenes provides relevant contextual information to improve their understanding.

In this paper, we use a convolutional neural network based algorithm to learn features from noisy labels to recover the 3D scene layout of a road image. The novelty of the algorithm relies on generating training labels by applying an algorithm trained on a general image dataset to classify on–board images. Further, we propose a novel texture descriptor based on a learned color plane fusion to obtain maximal uniformity in road areas. Finally, acquired (off–line) and current (on–line) information are combined to detect road areas in single images.

From quantitative and qualitative experiments, conducted on publicly available datasets, it is concluded that convolutional neural networks are suitable for learning 3D scene layout from noisy labels and provides a relative improvement of 7% compared to the baseline. Furthermore, combining color planes provides a statistical description of road areas that exhibits maximal uniformity and provides a relative improvement of 8% compared to the baseline. Finally, the improvement is even bigger when acquired and current information from a single image are combined.


Local Binary Pattern Single Image Texture Descriptor Convolutional Neural Network Pedestrian Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and Recognition Using Structure from Motion Point Clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Lookingbill, A., Rogers, J., Lieb, D., Curry, J., Thrun, S.: Reverse optical flow for self-supervised adaptive autonomous robot navigation. IJCV 74, 287–302 (2007)CrossRefGoogle Scholar
  3. 3.
    Geronimo, D., Lopez, A.M., Sappa, A.D., Graf, T.: Survey of pedestrian detection for advanced driver assistance systems. PAMI 32, 1239–1258 (2010)CrossRefGoogle Scholar
  4. 4.
    Ladicky, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.: Joint optimization for object class segmentation and dense stereo reconstruction. IJCV, 1–12 (2011)Google Scholar
  5. 5.
    Sturgess, P., Alahari, K., Ladicky, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: BMVC 2009 (2009)Google Scholar
  6. 6.
    Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. IJCV 75, 151–172 (2007)CrossRefGoogle Scholar
  7. 7.
    Saxena, A., Min, S., Ng, A.Y.: Make3d: Learning 3d scene structure from a single still image. PAMI 31(5), 824–840 (2009)CrossRefGoogle Scholar
  8. 8.
    Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting Visual Category Models to New Domains. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 213–226. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. PAMI 34, 465–479 (2012)CrossRefGoogle Scholar
  10. 10.
    Kulis, B., Saenko, K., Darrell, T.: What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In: CVPR 2011, pp. 1785–1792 (2011)Google Scholar
  11. 11.
    Alvarez, J.M., Lopez, A.M.: Road detection based on illuminant invariance. IEEE Trans. on ITS 12(1), 184–193 (2011)Google Scholar
  12. 12.
    Rasmussen, C.: Grouping dominant orientations for ill-structured road following. In: CVPR 2004 (2004)Google Scholar
  13. 13.
    Kong, H., Audibert, J., Ponce, J.: Vanishing point detection for road detection. In: CVPR 2009, pp. 96–103 (2009)Google Scholar
  14. 14.
    Ess, A., Mueller, T., Grabner, H., Gool, L.J.V.: Segmentation-based urban traffic scene understanding. In: BMVC 2009 (2009)Google Scholar
  15. 15.
    Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: ICCV 2009 (2009)Google Scholar
  16. 16.
    LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series. In: The Handbook of Brain Theory and Neural Networks. MIT Press (1995)Google Scholar
  17. 17.
    Cecotti, H., Graser, A.: Convolutional neural networks for p300 detection with application to brain-computer interfaces. PAMI 33, 433–445 (2011)CrossRefGoogle Scholar
  18. 18.
    Turaga, S.C., Murray, J.F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., Denk, W., Seung, H.S.: Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comp. 22, 511–538 (2010)zbMATHCrossRefGoogle Scholar
  19. 19.
    Levinshtein, A., Stere, A., Kutulakos, K.N., Fleet, D.J., Dickinson, S.J., Siddiqi, K.: Turbopixels: Fast superpixels using geometric flows. PAMI 31 (2009)Google Scholar
  20. 20.
    Petrou, M.: Image Processing: Dealing with Texture. Wiley (2006)Google Scholar
  21. 21.
    van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluation of color descriptors for object and scene recognition. In: CVPR 2008, pp. 453–464 (2008)Google Scholar
  22. 22.
    Gonzalez, R., Woods, R.: Section 10.4. In: Digital Image Processing, 2nd edn. Prentice Hall (2002)Google Scholar
  23. 23.
    Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jose M. Alvarez
    • 1
    • 3
  • Theo Gevers
    • 2
    • 3
  • Yann LeCun
    • 1
  • Antonio M. Lopez
    • 3
  1. 1.Courant Institute of Mathematical SciencesNew York UniversityNew YorkUSA
  2. 2.Faculty of ScienceUniversity of AmsterdamAmsterdamThe Netherlands
  3. 3.Computer Vision CenterUniv. Autònoma de BarcelonaBarcelonaSpain

Personalised recommendations