Abstract
Multimodal deep learning is about learning features over multiple modalities. Impressive progress has been made in deep learning solutions that rely on a single sensor modality for advanced driving. However, these approaches are limited to cover certain functionalities. The potential of multimodal sensor fusion has been very little exploited, although research vehicles are commonly provided with various sensor types. How to combine their data to achieve a complex scene analysis and improve therefore robustness in driving is still an open question. While different surveys have been done for intelligent vehicles or deep learning, to date no survey on multimodal deep learning for advanced driving exists. This paper attempts to narrow this gap by providing the first review that analyzes existing literature and two indispensable elements: sensors and datasets. We also provide our insights on future challenges and work to be done.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. IJMIR 7(2), 87–93 (2017)
Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Magaz. 35(1), 84–100 (2018)
Chowdhuri, S., Pankaj, T., Zipser, K.: Multi-modal multi-task deep learning for autonomous driving. CoRR abs/1709.05581 (2017)
Liu, G.H., Siravuru, A., Prabhakar, S., Veloso, M., Kantor, G.: Learning end-to-end multimodal sensor policies for autonomous navigation. arXiv preprint arXiv:1705.10422 (2017)
Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv preprint arXiv:1704.05519 (2017)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th ICML 2011, pp. 689–696 (2011)
Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. arXiv preprint arXiv:1711.06396 (2017)
Xiao, L., Wang, R., Dai, B., Fang, Y., Liu, D., Wu, T.: Hybrid conditional random field based camera-LiDAR fusion for road detection. Inf. Sci. 432, 543–558 (2017)
Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640. IEEE (2017)
Oliveira, G.L., Burgard, W., Brox, T.: Efficient deep models for monocular road segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4885–4891. IEEE (2016)
Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)
Carullo, A., Parvis, M.: An ultrasonic sensor for distance measurement in automotive applications. IEEE Sensors J. 1(2), 143–147 (2001)
Lombacher, J., Hahn, M., Dickmann, J., Wöhler, C.: Potential of radar for static object classification using deep learning methods. In: IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), pp. 1–4. IEEE (2016)
Virdi, J.: Using deep learning to predict obstacle trajectories for collision avoidance in autonomous vehicles. Ph.D. thesis, UC, San Diego (2017)
Shimada, H., Yamaguchi, A., Takada, H., Sato, K.: Implementation and evaluation of local dynamic map in safety driving systems. JTTs 5(02), 102 (2015)
Intempora: RTMaps. https://intempora.com. Accessed 18 Mar 2018
Elektrobit: EB Assist ADTF. https://www.elektrobit.com/products/eb-assist/adtf/. Accessed 18 Mar 2018
ROS. http://www.ros.org/. Accessed 18 Mar 2018
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on CVPR (2012)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Vonference on CVPR, pp. 3213–3223 (2016)
Wang, S., Bai, M., Mattyus, G., Chu, H., Luo, W., Yang, B., Liang, J., Cheverie, J., Fidler, S., Urtasun, R.: Torontocity: seeing the world with a million eyes. arXiv preprint arXiv:1612.00423 (2016)
Roynard, X., Deschaud, J.E., Goulette, F.: Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification. arXiv preprint arXiv:1712.00032 (2017)
Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford robotcar dataset. Int. J. Robot. Res. 36(1), 3–15 (2017)
Santana, E., Hotz, G.: Learning a driving simulator. arXiv:1608.01230 (2016)
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv preprint (2017)
Neuhold, G., Ollmann, T., Bulò, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of ICCV, pp. 22–29 (2017)
Choi, Y., Kim, N., Hwang, S., Park, K., Yoon, J.S., An, K., Kweon, I.S.: KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans. Intell. Transp. Syst. 19 (2018)
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes (2016)
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Hutter, M., Siegwart, R. (eds.) Field and Service Robotics. SPAR, vol. 5, pp. 621–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67361-5_40
Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017)
Ullman, S.: Against direct perception. BBS 3(3), 373–381 (1980)
Chabot, F., Chaouch, M., Rabarisoa, J., Teulière, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: Proceedings of IEEE CVPR, pp. 2040–2049 (2017)
Li, B.: 3D fully convolutional network for vehicle detection in point cloud. arXiv preprint arXiv:1611.08069 (2016)
Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D LiDAR using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in NIPS, pp. 91–99 (2015)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation. arXiv preprint arXiv:1712.02294 (2017)
Wang, Z., Zhan, W., Tomizuka, M.: Fusing bird view LIDAR point cloud and front view camera image for deep object detection. arXiv:1711.06703 (2017)
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Chi, L., Mu, Y.: Deep steering: learning end-to-end driving model from spatial and temporal visual cues. arXiv preprint arXiv:1708.03798 (2017)
Patel, N., Choromanska, A., Krishnamurthy, P., Khorrami, F.: Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments. In: International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)
Acknowledgments
This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement n\(^{\circ }\) 6880099, project Cloud-LSVA).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Aranjuelo, N., Unzueta, L., Arganda-Carreras, I., Otaegui, O. (2018). Multimodal Deep Learning for Advanced Driving Systems. In: Perales, F., Kittler, J. (eds) Articulated Motion and Deformable Objects. AMDO 2018. Lecture Notes in Computer Science(), vol 10945. Springer, Cham. https://doi.org/10.1007/978-3-319-94544-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-94544-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94543-9
Online ISBN: 978-3-319-94544-6
eBook Packages: Computer ScienceComputer Science (R0)