Multimodal Deep Learning for Advanced Driving Systems

Aranjuelo, Nerea; Unzueta, Luis; Arganda-Carreras, Ignacio; Otaegui, Oihana

doi:10.1007/978-3-319-94544-6_10

Nerea Aranjuelo¹⁵,
Luis Unzueta¹⁵,
Ignacio Arganda-Carreras^16,17,18 &
…
Oihana Otaegui¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10945))

Included in the following conference series:

International Conference on Articulated Motion and Deformable Objects

1111 Accesses
6 Citations

Abstract

Multimodal deep learning is about learning features over multiple modalities. Impressive progress has been made in deep learning solutions that rely on a single sensor modality for advanced driving. However, these approaches are limited to cover certain functionalities. The potential of multimodal sensor fusion has been very little exploited, although research vehicles are commonly provided with various sensor types. How to combine their data to achieve a complex scene analysis and improve therefore robustness in driving is still an open question. While different surveys have been done for intelligent vehicles or deep learning, to date no survey on multimodal deep learning for advanced driving exists. This paper attempts to narrow this gap by providing the first review that analyzes existing literature and two indispensable elements: sensors and datasets. We also provide our insights on future challenges and work to be done.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. IJMIR 7(2), 87–93 (2017)
Google Scholar
Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Magaz. 35(1), 84–100 (2018)
Article Google Scholar
Chowdhuri, S., Pankaj, T., Zipser, K.: Multi-modal multi-task deep learning for autonomous driving. CoRR abs/1709.05581 (2017)
Google Scholar
Liu, G.H., Siravuru, A., Prabhakar, S., Veloso, M., Kantor, G.: Learning end-to-end multimodal sensor policies for autonomous navigation. arXiv preprint arXiv:1705.10422 (2017)
Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv preprint arXiv:1704.05519 (2017)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th ICML 2011, pp. 689–696 (2011)
Google Scholar
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. arXiv preprint arXiv:1711.06396 (2017)
Xiao, L., Wang, R., Dai, B., Fang, Y., Liu, D., Wu, T.: Hybrid conditional random field based camera-LiDAR fusion for road detection. Inf. Sci. 432, 543–558 (2017)
Article MathSciNet Google Scholar
Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640. IEEE (2017)
Google Scholar
Oliveira, G.L., Burgard, W., Brox, T.: Efficient deep models for monocular road segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4885–4891. IEEE (2016)
Google Scholar
Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)
Carullo, A., Parvis, M.: An ultrasonic sensor for distance measurement in automotive applications. IEEE Sensors J. 1(2), 143–147 (2001)
Article Google Scholar
Lombacher, J., Hahn, M., Dickmann, J., Wöhler, C.: Potential of radar for static object classification using deep learning methods. In: IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), pp. 1–4. IEEE (2016)
Google Scholar
Virdi, J.: Using deep learning to predict obstacle trajectories for collision avoidance in autonomous vehicles. Ph.D. thesis, UC, San Diego (2017)
Google Scholar
Shimada, H., Yamaguchi, A., Takada, H., Sato, K.: Implementation and evaluation of local dynamic map in safety driving systems. JTTs 5(02), 102 (2015)
Article Google Scholar
Intempora: RTMaps. https://intempora.com. Accessed 18 Mar 2018
Elektrobit: EB Assist ADTF. https://www.elektrobit.com/products/eb-assist/adtf/. Accessed 18 Mar 2018
ROS. http://www.ros.org/. Accessed 18 Mar 2018
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on CVPR (2012)
Google Scholar
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Vonference on CVPR, pp. 3213–3223 (2016)
Google Scholar
Wang, S., Bai, M., Mattyus, G., Chu, H., Luo, W., Yang, B., Liang, J., Cheverie, J., Fidler, S., Urtasun, R.: Torontocity: seeing the world with a million eyes. arXiv preprint arXiv:1612.00423 (2016)
Roynard, X., Deschaud, J.E., Goulette, F.: Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification. arXiv preprint arXiv:1712.00032 (2017)
Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford robotcar dataset. Int. J. Robot. Res. 36(1), 3–15 (2017)
Article Google Scholar
Santana, E., Hotz, G.: Learning a driving simulator. arXiv:1608.01230 (2016)
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv preprint (2017)
Google Scholar
Neuhold, G., Ollmann, T., Bulò, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of ICCV, pp. 22–29 (2017)
Google Scholar
Choi, Y., Kim, N., Hwang, S., Park, K., Yoon, J.S., An, K., Kweon, I.S.: KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans. Intell. Transp. Syst. 19 (2018)
Article Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes (2016)
Google Scholar
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Hutter, M., Siegwart, R. (eds.) Field and Service Robotics. SPAR, vol. 5, pp. 621–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67361-5_40
Chapter Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017)
Ullman, S.: Against direct perception. BBS 3(3), 373–381 (1980)
Google Scholar
Chabot, F., Chaouch, M., Rabarisoa, J., Teulière, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: Proceedings of IEEE CVPR, pp. 2040–2049 (2017)
Google Scholar
Li, B.: 3D fully convolutional network for vehicle detection in point cloud. arXiv preprint arXiv:1611.08069 (2016)
Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D LiDAR using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in NIPS, pp. 91–99 (2015)
Google Scholar
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation. arXiv preprint arXiv:1712.02294 (2017)
Wang, Z., Zhan, W., Tomizuka, M.: Fusing bird view LIDAR point cloud and front view camera image for deep object detection. arXiv:1711.06703 (2017)
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Chi, L., Mu, Y.: Deep steering: learning end-to-end driving model from spatial and temporal visual cues. arXiv preprint arXiv:1708.03798 (2017)
Patel, N., Choromanska, A., Krishnamurthy, P., Khorrami, F.: Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments. In: International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)
Google Scholar

Download references

Acknowledgments

This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement n\(^{\circ }\) 6880099, project Cloud-LSVA).

Author information

Authors and Affiliations

Vicomtech, Paseo Mikeletegi 57, San Sebastian, Spain
Nerea Aranjuelo, Luis Unzueta & Oihana Otaegui
Basque Country University (UPV/EHU), San Sebastian, Spain
Ignacio Arganda-Carreras
Ikerbasque, Basque Foundation for Science, Bilbao, Spain
Ignacio Arganda-Carreras
Donostia International Physics Center (DIPC), San Sebastian, Spain
Ignacio Arganda-Carreras

Authors

Nerea Aranjuelo
View author publications
You can also search for this author in PubMed Google Scholar
Luis Unzueta
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Arganda-Carreras
View author publications
You can also search for this author in PubMed Google Scholar
Oihana Otaegui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nerea Aranjuelo .

Editor information

Editors and Affiliations

UIB – Universitat de les Illes Balears, Palma de Mallorca, Spain
Francisco José Perales
University of Surrey, Guildford, United Kingdom
Josef Kittler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aranjuelo, N., Unzueta, L., Arganda-Carreras, I., Otaegui, O. (2018). Multimodal Deep Learning for Advanced Driving Systems. In: Perales, F., Kittler, J. (eds) Articulated Motion and Deformable Objects. AMDO 2018. Lecture Notes in Computer Science(), vol 10945. Springer, Cham. https://doi.org/10.1007/978-3-319-94544-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-94544-6_10
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94543-9
Online ISBN: 978-3-319-94544-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics