Skip to main content

Multimodal Deep Learning for Advanced Driving Systems

  • Conference paper
  • First Online:
Articulated Motion and Deformable Objects (AMDO 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10945))

Included in the following conference series:

Abstract

Multimodal deep learning is about learning features over multiple modalities. Impressive progress has been made in deep learning solutions that rely on a single sensor modality for advanced driving. However, these approaches are limited to cover certain functionalities. The potential of multimodal sensor fusion has been very little exploited, although research vehicles are commonly provided with various sensor types. How to combine their data to achieve a complex scene analysis and improve therefore robustness in driving is still an open question. While different surveys have been done for intelligent vehicles or deep learning, to date no survey on multimodal deep learning for advanced driving exists. This paper attempts to narrow this gap by providing the first review that analyzes existing literature and two indispensable elements: sensors and datasets. We also provide our insights on future challenges and work to be done.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. IJMIR 7(2), 87–93 (2017)

    Google Scholar 

  2. Han, J., Zhang, D., Cheng, G., Liu, N., Xu, D.: Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process. Magaz. 35(1), 84–100 (2018)

    Article  Google Scholar 

  3. Chowdhuri, S., Pankaj, T., Zipser, K.: Multi-modal multi-task deep learning for autonomous driving. CoRR abs/1709.05581 (2017)

    Google Scholar 

  4. Liu, G.H., Siravuru, A., Prabhakar, S., Veloso, M., Kantor, G.: Learning end-to-end multimodal sensor policies for autonomous navigation. arXiv preprint arXiv:1705.10422 (2017)

  5. Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv preprint arXiv:1704.05519 (2017)

  6. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th ICML 2011, pp. 689–696 (2011)

    Google Scholar 

  7. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. arXiv preprint arXiv:1711.06396 (2017)

  8. Xiao, L., Wang, R., Dai, B., Fang, Y., Liu, D., Wu, T.: Hybrid conditional random field based camera-LiDAR fusion for road detection. Inf. Sci. 432, 543–558 (2017)

    Article  MathSciNet  Google Scholar 

  9. Mousavian, A., Anguelov, D., Flynn, J., Košecká, J.: 3D bounding box estimation using deep learning and geometry. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5632–5640. IEEE (2017)

    Google Scholar 

  10. Oliveira, G.L., Burgard, W., Brox, T.: Efficient deep models for monocular road segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4885–4891. IEEE (2016)

    Google Scholar 

  11. Liu, J., Zhang, S., Wang, S., Metaxas, D.N.: Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644 (2016)

  12. Carullo, A., Parvis, M.: An ultrasonic sensor for distance measurement in automotive applications. IEEE Sensors J. 1(2), 143–147 (2001)

    Article  Google Scholar 

  13. Lombacher, J., Hahn, M., Dickmann, J., Wöhler, C.: Potential of radar for static object classification using deep learning methods. In: IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), pp. 1–4. IEEE (2016)

    Google Scholar 

  14. Virdi, J.: Using deep learning to predict obstacle trajectories for collision avoidance in autonomous vehicles. Ph.D. thesis, UC, San Diego (2017)

    Google Scholar 

  15. Shimada, H., Yamaguchi, A., Takada, H., Sato, K.: Implementation and evaluation of local dynamic map in safety driving systems. JTTs 5(02), 102 (2015)

    Article  Google Scholar 

  16. Intempora: RTMaps. https://intempora.com. Accessed 18 Mar 2018

  17. Elektrobit: EB Assist ADTF. https://www.elektrobit.com/products/eb-assist/adtf/. Accessed 18 Mar 2018

  18. ROS. http://www.ros.org/. Accessed 18 Mar 2018

  19. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on CVPR (2012)

    Google Scholar 

  20. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Vonference on CVPR, pp. 3213–3223 (2016)

    Google Scholar 

  21. Wang, S., Bai, M., Mattyus, G., Chu, H., Luo, W., Yang, B., Liang, J., Cheverie, J., Fidler, S., Urtasun, R.: Torontocity: seeing the world with a million eyes. arXiv preprint arXiv:1612.00423 (2016)

  22. Roynard, X., Deschaud, J.E., Goulette, F.: Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classification. arXiv preprint arXiv:1712.00032 (2017)

  23. Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford robotcar dataset. Int. J. Robot. Res. 36(1), 3–15 (2017)

    Article  Google Scholar 

  24. Santana, E., Hotz, G.: Learning a driving simulator. arXiv:1608.01230 (2016)

  25. Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv preprint (2017)

    Google Scholar 

  26. Neuhold, G., Ollmann, T., Bulò, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: Proceedings of ICCV, pp. 22–29 (2017)

    Google Scholar 

  27. Choi, Y., Kim, N., Hwang, S., Park, K., Yoon, J.S., An, K., Kweon, I.S.: KAIST multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans. Intell. Transp. Syst. 19 (2018)

    Article  Google Scholar 

  28. Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes (2016)

    Google Scholar 

  29. Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7

    Chapter  Google Scholar 

  30. Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Hutter, M., Siegwart, R. (eds.) Field and Service Robotics. SPAR, vol. 5, pp. 621–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67361-5_40

    Chapter  Google Scholar 

  31. Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V.: CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017)

  32. Ullman, S.: Against direct perception. BBS 3(3), 373–381 (1980)

    Google Scholar 

  33. Chabot, F., Chaouch, M., Rabarisoa, J., Teulière, C., Chateau, T.: Deep MANTA: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: Proceedings of IEEE CVPR, pp. 2040–2049 (2017)

    Google Scholar 

  34. Li, B.: 3D fully convolutional network for vehicle detection in point cloud. arXiv preprint arXiv:1611.08069 (2016)

  35. Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D LiDAR using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)

  36. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: IEEE CVPR, vol. 1, p. 3 (2017)

    Google Scholar 

  37. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in NIPS, pp. 91–99 (2015)

    Google Scholar 

  38. Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation. arXiv preprint arXiv:1712.02294 (2017)

  39. Wang, Z., Zhan, W., Tomizuka, M.: Fusing bird view LIDAR point cloud and front view camera image for deep object detection. arXiv:1711.06703 (2017)

  40. Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)

  41. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

  42. Chi, L., Mu, Y.: Deep steering: learning end-to-end driving model from spatial and temporal visual cues. arXiv preprint arXiv:1708.03798 (2017)

  43. Patel, N., Choromanska, A., Krishnamurthy, P., Khorrami, F.: Sensor modality fusion with CNNs for UGV autonomous driving in indoor environments. In: International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)

    Google Scholar 

Download references

Acknowledgments

This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement n\(^{\circ }\) 6880099, project Cloud-LSVA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nerea Aranjuelo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aranjuelo, N., Unzueta, L., Arganda-Carreras, I., Otaegui, O. (2018). Multimodal Deep Learning for Advanced Driving Systems. In: Perales, F., Kittler, J. (eds) Articulated Motion and Deformable Objects. AMDO 2018. Lecture Notes in Computer Science(), vol 10945. Springer, Cham. https://doi.org/10.1007/978-3-319-94544-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94544-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94543-9

  • Online ISBN: 978-3-319-94544-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics