Road Scene Segmentation from a Single Image
Road scene segmentation is important in computer vision for different applications such as autonomous driving and pedestrian detection. Recovering the 3D structure of road scenes provides relevant contextual information to improve their understanding.
In this paper, we use a convolutional neural network based algorithm to learn features from noisy labels to recover the 3D scene layout of a road image. The novelty of the algorithm relies on generating training labels by applying an algorithm trained on a general image dataset to classify on–board images. Further, we propose a novel texture descriptor based on a learned color plane fusion to obtain maximal uniformity in road areas. Finally, acquired (off–line) and current (on–line) information are combined to detect road areas in single images.
From quantitative and qualitative experiments, conducted on publicly available datasets, it is concluded that convolutional neural networks are suitable for learning 3D scene layout from noisy labels and provides a relative improvement of 7% compared to the baseline. Furthermore, combining color planes provides a statistical description of road areas that exhibits maximal uniformity and provides a relative improvement of 8% compared to the baseline. Finally, the improvement is even bigger when acquired and current information from a single image are combined.
KeywordsLocal Binary Pattern Single Image Texture Descriptor Convolutional Neural Network Pedestrian Detection
Unable to display preview. Download preview PDF.
- 4.Ladicky, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.: Joint optimization for object class segmentation and dense stereo reconstruction. IJCV, 1–12 (2011)Google Scholar
- 5.Sturgess, P., Alahari, K., Ladicky, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: BMVC 2009 (2009)Google Scholar
- 10.Kulis, B., Saenko, K., Darrell, T.: What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In: CVPR 2011, pp. 1785–1792 (2011)Google Scholar
- 11.Alvarez, J.M., Lopez, A.M.: Road detection based on illuminant invariance. IEEE Trans. on ITS 12(1), 184–193 (2011)Google Scholar
- 12.Rasmussen, C.: Grouping dominant orientations for ill-structured road following. In: CVPR 2004 (2004)Google Scholar
- 13.Kong, H., Audibert, J., Ponce, J.: Vanishing point detection for road detection. In: CVPR 2009, pp. 96–103 (2009)Google Scholar
- 14.Ess, A., Mueller, T., Grabner, H., Gool, L.J.V.: Segmentation-based urban traffic scene understanding. In: BMVC 2009 (2009)Google Scholar
- 15.Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: ICCV 2009 (2009)Google Scholar
- 16.LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series. In: The Handbook of Brain Theory and Neural Networks. MIT Press (1995)Google Scholar
- 19.Levinshtein, A., Stere, A., Kutulakos, K.N., Fleet, D.J., Dickinson, S.J., Siddiqi, K.: Turbopixels: Fast superpixels using geometric flows. PAMI 31 (2009)Google Scholar
- 20.Petrou, M.: Image Processing: Dealing with Texture. Wiley (2006)Google Scholar
- 21.van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluation of color descriptors for object and scene recognition. In: CVPR 2008, pp. 453–464 (2008)Google Scholar
- 22.Gonzalez, R., Woods, R.: Section 10.4. In: Digital Image Processing, 2nd edn. Prentice Hall (2002)Google Scholar
- 23.Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters (2008)Google Scholar