Learning Where to Drive by Watching Others

Bautista, Miguel A.; Fuchs, Patrick; Ommer, Björn

doi:10.1007/978-3-319-66709-6_3

Miguel A. Bautista¹⁵,
Patrick Fuchs¹⁵ &
Björn Ommer¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10496))

Included in the following conference series:

German Conference on Pattern Recognition

2350 Accesses
1 Citations

Abstract

The most prominent approach for autonomous cars to learn what areas of a scene are drivable is to utilize tedious human supervision in the form of pixel-wise image labeling for training deep semantic segmentation algorithms. However, the underlying CNNs require vast amounts of this training information, rendering the expensive pixel-wise labeling of images a bottleneck. Thus, we propose a self-supervised approach that is able to utilize the myriad of easily available dashcam videos from YouTube or from autonomous vehicles to perform fully automatic training by simply watching others drive. We play training videos backwards in time and track patches that cars have driven over together with their spatio-temporal interrelations, which are a rich source of context information. Collecting large numbers of these local regions enables fully automatic self-supervision for training a CNN. The proposed method has the potential to extend and complement the popular supervised CNN learning of drivable pixels by using a rich, presently untapped source of unlabeled training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our approach runs at 15 FPS on a NVIDIA Titan X GPU.
2.
https://hcicloud.iwr.uni-heidelberg.de/index.php/s/tutGQ2J3XoUyqkU.

References

Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 37–45 (2015)
Google Scholar
Alvarez, J.M., Gevers, T., LeCun, Y., Lopez, A.M.: Road scene segmentation from a single image. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 376–389. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33786-4_28
Chapter Google Scholar
Bautista, M.A., Sanakoyeu, A., Ommer, B.: Deep unsupervised similarity learning using partially ordered sets. In: Proceedings of IEEE Computer Vision and Pattern Recognition (2017)
Google Scholar
Bautista, M.A., Sanakoyeu, A., Tikhoncheva, E., Ommer, B.: Cliquecnn: deep unsupervised exemplar learning. In: Advances In Neural Information Processing Systems, pp. 3846–3854 (2016)
Google Scholar
Boyd, E.M., Fales, A.W.: Reflective learning key to learning from experience. J. Humanistic Psychol. 23(2), 99–117 (1983)
Article Google Scholar
Chan, F.-H., Chen, Y.-T., Xiang, Y., Sun, M.: Anticipating accidents in dashcam videos. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10114, pp. 136–153. Springer, Cham (2017). doi:10.1007/978-3-319-54190-7_9
Chapter Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)
Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: Advances in NIPS, pp. 424–432 (2015)
Google Scholar
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE CVPR, pp. 3213–3223 (2016)
Google Scholar
Dahlkamp, H., Kaehler, A., Stavens, D., Thrun, S., Bradski, G.R.: Self-supervised monocular road detection in desert terrain. In: Robotics: Science and Systems, vol. 38. Philadelphia (2006)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1422–1430 (2015)
Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Fritsch, J., Kuehnl, T., Geiger, A.: A new performance measure and evaluation benchmark for road detection algorithms. In: International Conference on Intelligent Transportation Systems (ITSC) (2013)
Google Scholar
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. CoRR abs/1605.06457 (2016). http://arxiv.org/abs/1605.06457
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE CVPR, pp. 580–587 (2014)
Google Scholar
Guo, C., Mita, S., McAllester, D.: MRF-based road detection with unsupervised learning for autonomous driving in changing environments. In: 2010 IEEE Intelligent Vehicles Symposium (IV), pp. 361–368. IEEE (2010)
Google Scholar
Guo, C., Yamabe, T., Mita, S.: Robust road boundary estimation for intelligent vehicles in challenging scenarios based on a semantic graph. In: 2012 IEEE Intelligent Vehicles Symposium (IV), pp. 37–44. IEEE (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). doi:10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Hillel, A.B., Lerner, R., Levi, D., Raz, G.: Recent progress in road and lane detection: a survey. Mach. Vis. Appl. 25(3), 727–745 (2014)
Article Google Scholar
Jin, J., Fu, K., Zhang, C.: Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Transp. Intell. Trans. Syst. 15(5), 1991–2000 (2014)
Article Google Scholar
Kolb, D.A.: Experiential Learning: Experience as the Source of Learning and Development. FT Press, Upper Saddle River (2014)
Google Scholar
Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Lu, M., Wevers, K., Van Der Heijden, R.: Technical feasibility of advanced driver assistance systems (ADAS) for road traffic safety. Transp. Plan. Technol. 28(3), 167–187 (2005)
Article Google Scholar
Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. IJCAI 81, 674–679 (1981)
Google Scholar
Meltzoff, A.N., Brooks, R.: Self-experience as a mechanism for learning about others: a training study in social cognition. Dev. Psychol. 44(5), 1257 (2008)
Article Google Scholar
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). doi:10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vázquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 3234–3243 (2016)
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23, 309–314 (2004)
Article Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE CVPR, pp. 2818–2826 (2016)
Google Scholar
Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)
Google Scholar
Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2015)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zhang, W.: Lidar-based road and road-edge detection. In: 2010 IEEE Intelligent Vehicles Symposium (IV), pp. 845–848. IEEE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Heidelberg Collaboratory for Image Processing IWR, Heidelberg University, Heidelberg, Germany
Miguel A. Bautista, Patrick Fuchs & Björn Ommer

Authors

Miguel A. Bautista
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Fuchs
View author publications
You can also search for this author in PubMed Google Scholar
Björn Ommer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel A. Bautista .

Editor information

Editors and Affiliations

University of Basel, Basel, Switzerland
Volker Roth
University of Basel, Basel, Switzerland
Thomas Vetter

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 59347 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bautista, M.A., Fuchs, P., Ommer, B. (2017). Learning Where to Drive by Watching Others. In: Roth, V., Vetter, T. (eds) Pattern Recognition. GCPR 2017. Lecture Notes in Computer Science(), vol 10496. Springer, Cham. https://doi.org/10.1007/978-3-319-66709-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-66709-6_3
Published: 15 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66708-9
Online ISBN: 978-3-319-66709-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics