Visual Scene Understanding for Autonomous Driving Using Semantic Segmentation

Hofmarcher, Markus; Unterthiner, Thomas; Arjona-Medina, José; Klambauer, Günter; Hochreiter, Sepp; Nessler, Bernhard

doi:10.1007/978-3-030-28954-6_15

Visual Scene Understanding for Autonomous Driving Using Semantic Segmentation

Markus Hofmarcher¹³,
Thomas Unterthiner¹³,
José Arjona-Medina¹³,
Günter Klambauer¹³,
Sepp Hochreiter¹³ &
…
Bernhard Nessler¹³

Chapter
First Online: 10 September 2019

22k Accesses
20 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11700))

Abstract

Deep neural networks are an increasingly important technique for autonomous driving, especially as a visual perception component. Deployment in a real environment necessitates the explainability and inspectability of the algorithms controlling the vehicle. Such insightful explanations are relevant not only for legal issues and insurance matters but also for engineers and developers in order to achieve provable functional quality guarantees. This applies to all scenarios where the results of deep networks control potentially life threatening machines. We suggest the use of a tiered approach, whose main component is a semantic segmentation model, over an end-to-end approach for an autonomous driving system. In order for a system to provide meaningful explanations for its decisions it is necessary to give an explanation about the semantics that it attributes to the complex sensory inputs that it perceives. In the context of high-dimensional visual input this attribution is done as a pixel-wise classification process that assigns an object class to every pixel in the image. This process is called semantic segmentation.

We propose an architecture that delivers real-time viable segmentation performance and which conforms to the limitations in computational power that is available in production vehicles. The output of such a semantic segmentation model can be used as an input for an interpretable autonomous driving system.

M. Hofmarcher, T. Unterthiner and J. Arjona-Medina—Equally contributed to this work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, NIPS (2014)
Google Scholar
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), 1–46 (2015)
Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Bojarski, M., et al.: End to end learning for self-driving cars. CoRR abs/1604.07316 (2016)
Google Scholar
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587 (2017)
Google Scholar
Chen, Z., Huang, X.: End-to-end learning for lane keeping of self-driving cars. In: IEEE Intelligent Vehicles Symposium, pp. 1856–1860. IEEE (2017)
Google Scholar
Chi, L., Mu, Y.: Deep steering: learning end-to-end driving model from spatial and temporal visual cues. CoRR abs/1708.03798 (2017)
Google Scholar
Ciresan, D., Meier, U., Masci, J., Schmidhuber, J.: Multi-column deep neural network for traffic sign classification. Neural Netw. 32, 333–338 (2012)
Article Google Scholar
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: International Conference on Learning Representations, ICLR (2016)
Google Scholar
Cordts, M., et al.: The Cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)
Article Google Scholar
Han, S., et al.: Eie: efficient inference engine on compressed deep neural network. In: International Conference on Computer Architecture (2016)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations, ICLR (2016)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, NIPS (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, ICCV (2015)
Google Scholar
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015)
Google Scholar
Hochreiter, S.: Untersuchungen zu dynamischen neuronalen Netzen. Master’s thesis, Technische Universität München, Institut für Informatik (1991)
Google Scholar
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998)
Article MathSciNet Google Scholar
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, K. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)
Google Scholar
Iandola, F.N., Moskewicz, M.W., Ashraf, K., Han, S., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1mb model size. CoRR abs/1602.07360 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML (2015)
Google Scholar
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, NIPS, Curran Associates, Inc. (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, NIPS (2012)
Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Liang-Chieh, C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: International Conference on Learning Representations, ICLR (2015)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2015)
Google Scholar
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European Conference on Computer Vision, ECCV (2018)
Chapter Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. CoRR abs/1606.02147 (2016)
Google Scholar
Pinheiro, P.O., Lin, T.Y., Collobert, R., Dollár, P.: Learning to refine object segments. In: Proceedings of the European Conference on Computer Vision, ECCV (2016)
Chapter Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: Proceedings of the 33rd International Conference on Machine Learning, ICML (2016)
Google Scholar
Romera, E., Álvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19, 263–272 (2018)
Article Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Networks 61, 85–117 (2015)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference of Learning Representations, ICLR (2015)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. CoRR abs/1312.6034 (2013)
Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML (2017)
Google Scholar
Treml, M., et al.: Speeding up semantic segmentation for autonomous driving. In: Workshop on Machine Learning for Intelligent Transport Systems, Neural Information Processing Systems (NIPS) (2016)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations, ICLR (2016)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of the European Conference on Computer Vision, ECCV (2014)
Google Scholar
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision, ECCV (2018)
Chapter Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2017)
Google Scholar
Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: IEEE International Conference on Computer Vision, ICCV (2015)
Google Scholar

Download references

Acknowledgements

This work was supported by Audi.JKU Deep Learning Center, Audi Electronics Venture GmbH, Zalando SE with Research Agreement 01/2016, the Austrian Science Fund with Project P28660-N31 and NVIDIA Corporation.

Author information

Authors and Affiliations

Johannes Kepler University Linz, 4040, Linz, Austria
Markus Hofmarcher, Thomas Unterthiner, José Arjona-Medina, Günter Klambauer, Sepp Hochreiter & Bernhard Nessler

Authors

Markus Hofmarcher
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Unterthiner
View author publications
You can also search for this author in PubMed Google Scholar
José Arjona-Medina
View author publications
You can also search for this author in PubMed Google Scholar
Günter Klambauer
View author publications
You can also search for this author in PubMed Google Scholar
Sepp Hochreiter
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Nessler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Hofmarcher .

Editor information

Editors and Affiliations

Fraunhofer Heinrich Hertz Institute, Berlin, Germany
Wojciech Samek
Technische Universität Berlin, Berlin, Germany
Grégoire Montavon
University of Oxford, Oxford, UK
Andrea Vedaldi
Technical University of Denmark, Kgs. Lyngby, Denmark
Lars Kai Hansen
Sekretariat MAR 4-1, Technical University of Berlin, Berlin, Berlin, Germany
Klaus-Robert Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hofmarcher, M., Unterthiner, T., Arjona-Medina, J., Klambauer, G., Hochreiter, S., Nessler, B. (2019). Visual Scene Understanding for Autonomous Driving Using Semantic Segmentation. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L., Müller, KR. (eds) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Lecture Notes in Computer Science(), vol 11700. Springer, Cham. https://doi.org/10.1007/978-3-030-28954-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-28954-6_15
Published: 10 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28953-9
Online ISBN: 978-3-030-28954-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics