Skip to main content

Learning Where to Drive by Watching Others

  • Conference paper
  • First Online:
Pattern Recognition (GCPR 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10496))

Included in the following conference series:

Abstract

The most prominent approach for autonomous cars to learn what areas of a scene are drivable is to utilize tedious human supervision in the form of pixel-wise image labeling for training deep semantic segmentation algorithms. However, the underlying CNNs require vast amounts of this training information, rendering the expensive pixel-wise labeling of images a bottleneck. Thus, we propose a self-supervised approach that is able to utilize the myriad of easily available dashcam videos from YouTube or from autonomous vehicles to perform fully automatic training by simply watching others drive. We play training videos backwards in time and track patches that cars have driven over together with their spatio-temporal interrelations, which are a rich source of context information. Collecting large numbers of these local regions enables fully automatic self-supervision for training a CNN. The proposed method has the potential to extend and complement the popular supervised CNN learning of drivable pixels by using a rich, presently untapped source of unlabeled training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our approach runs at 15 FPS on a NVIDIA Titan X GPU.

  2. 2.

    https://hcicloud.iwr.uni-heidelberg.de/index.php/s/tutGQ2J3XoUyqkU.

References

  1. Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 37–45 (2015)

    Google Scholar 

  2. Alvarez, J.M., Gevers, T., LeCun, Y., Lopez, A.M.: Road scene segmentation from a single image. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 376–389. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33786-4_28

    Chapter  Google Scholar 

  3. Bautista, M.A., Sanakoyeu, A., Ommer, B.: Deep unsupervised similarity learning using partially ordered sets. In: Proceedings of IEEE Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  4. Bautista, M.A., Sanakoyeu, A., Tikhoncheva, E., Ommer, B.: Cliquecnn: deep unsupervised exemplar learning. In: Advances In Neural Information Processing Systems, pp. 3846–3854 (2016)

    Google Scholar 

  5. Boyd, E.M., Fales, A.W.: Reflective learning key to learning from experience. J. Humanistic Psychol. 23(2), 99–117 (1983)

    Article  Google Scholar 

  6. Chan, F.-H., Chen, Y.-T., Xiang, Y., Sun, M.: Anticipating accidents in dashcam videos. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10114, pp. 136–153. Springer, Cham (2017). doi:10.1007/978-3-319-54190-7_9

    Chapter  Google Scholar 

  7. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)

  8. Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: Advances in NIPS, pp. 424–432 (2015)

    Google Scholar 

  9. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE CVPR, pp. 3213–3223 (2016)

    Google Scholar 

  10. Dahlkamp, H., Kaehler, A., Stavens, D., Thrun, S., Bradski, G.R.: Self-supervised monocular road detection in desert terrain. In: Robotics: Science and Systems, vol. 38. Philadelphia (2006)

    Google Scholar 

  11. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)

    Google Scholar 

  12. Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1422–1430 (2015)

    Google Scholar 

  13. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  14. Fritsch, J., Kuehnl, T., Geiger, A.: A new performance measure and evaluation benchmark for road detection algorithms. In: International Conference on Intelligent Transportation Systems (ITSC) (2013)

    Google Scholar 

  15. Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. CoRR abs/1605.06457 (2016). http://arxiv.org/abs/1605.06457

  16. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE CVPR, pp. 580–587 (2014)

    Google Scholar 

  17. Guo, C., Mita, S., McAllester, D.: MRF-based road detection with unsupervised learning for autonomous driving in changing environments. In: 2010 IEEE Intelligent Vehicles Symposium (IV), pp. 361–368. IEEE (2010)

    Google Scholar 

  18. Guo, C., Yamabe, T., Mita, S.: Robust road boundary estimation for intelligent vehicles in challenging scenarios based on a semantic graph. In: 2012 IEEE Intelligent Vehicles Symposium (IV), pp. 37–44. IEEE (2012)

    Google Scholar 

  19. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). doi:10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  20. Hillel, A.B., Lerner, R., Levi, D., Raz, G.: Recent progress in road and lane detection: a survey. Mach. Vis. Appl. 25(3), 727–745 (2014)

    Article  Google Scholar 

  21. Jin, J., Fu, K., Zhang, C.: Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Transp. Intell. Trans. Syst. 15(5), 1991–2000 (2014)

    Article  Google Scholar 

  22. Kolb, D.A.: Experiential Learning: Experience as the Source of Learning and Development. FT Press, Upper Saddle River (2014)

    Google Scholar 

  23. Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)

  24. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

    Google Scholar 

  25. Lu, M., Wevers, K., Van Der Heijden, R.: Technical feasibility of advanced driver assistance systems (ADAS) for road traffic safety. Transp. Plan. Technol. 28(3), 167–187 (2005)

    Article  Google Scholar 

  26. Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision. IJCAI 81, 674–679 (1981)

    Google Scholar 

  27. Meltzoff, A.N., Brooks, R.: Self-experience as a mechanism for learning about others: a training study in social cognition. Dev. Psychol. 44(5), 1257 (2008)

    Article  Google Scholar 

  28. Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). doi:10.1007/978-3-319-46475-6_7

    Chapter  Google Scholar 

  29. Ros, G., Sellart, L., Materzynska, J., Vázquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 3234–3243 (2016)

    Google Scholar 

  30. Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23, 309–314 (2004)

    Article  Google Scholar 

  31. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE CVPR, pp. 2818–2826 (2016)

    Google Scholar 

  32. Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)

    Google Scholar 

  33. Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2015)

    Google Scholar 

  34. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

  35. Zhang, W.: Lidar-based road and road-edge detection. In: 2010 IEEE Intelligent Vehicles Symposium (IV), pp. 845–848. IEEE (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel A. Bautista .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 59347 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bautista, M.A., Fuchs, P., Ommer, B. (2017). Learning Where to Drive by Watching Others. In: Roth, V., Vetter, T. (eds) Pattern Recognition. GCPR 2017. Lecture Notes in Computer Science(), vol 10496. Springer, Cham. https://doi.org/10.1007/978-3-319-66709-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66709-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66708-9

  • Online ISBN: 978-3-319-66709-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics