Perspectives on Deep Multimodel Robot Learning

Burgard, Wolfram; Valada, Abhinav; Radwan, Noha; Naseer, Tayyab; Zhang, Jingwei; Vertens, Johan; Mees, Oier; Eitel, Andreas; Oliveira, Gabriel

doi:10.1007/978-3-030-28619-4_3

Wolfram Burgard¹⁴,
Abhinav Valada¹⁴,
Noha Radwan¹⁴,
Tayyab Naseer¹⁴,
Jingwei Zhang¹⁴,
Johan Vertens¹⁴,
Oier Mees¹⁴,
Andreas Eitel¹⁴ &
…
Gabriel Oliveira¹⁴

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 10))

2676 Accesses
5 Citations

Abstract

In the last decade, deep learning has revolutionized various components of the conventional robot autonomy stack including aspects of perception, navigation and manipulation. There have been numerous advances in perfecting individual tasks such as scene understanding, visual localization, end-to-end navigation and grasping, which has given us a critical understanding on how to create individual architectures for a specific task. This now brings us to the question, as to whether this disjoint learning of models for robotic tasks, effective in the real-world and whether it is scalable? And more generally, is training task specific models on task specific datasets beneficial to architecting robot intelligence as a whole? In this paper, we argue that multimodel learning or joint multi-task learning is an effective strategy for enabling robots to excel across multiple domains. We describe how multimodel learning can facilitate generalization to unseen scenarios by utilizing domain-specific cues from auxiliary tasks and discuss some of the current mechanisms that can be employed to design multimodel frameworks for robot autonomy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Duong, L., Cohn, T., Bird, S., Cook, P.: Low resource dependency parsing: cross-lingual parameter sharing in a neural network parser. In: 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (2015)
Google Scholar
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: International Conference on Intelligent Robots and Systems (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv:1410.5401 (2014)
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)
Article Google Scholar
Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian Conference on Computer Vision (2016)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-DOF camera relocalization. In: International Conference on Computer Vision (2015)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: European Conference on Computer Vision (2016)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Long, M., Wang, J.: Learning multiple tasks with deep relationship networks. arXiv:1506.02117 (2015)
Lotter, W., Kreiman, G., Cox, D.D.: Deep predictive coding networks for video prediction and unsupervised learning. arXiv:1605.08104 (2016)
Mees, O., Eitel, A., Burgard, W.: Choosing smartly: adaptive multimodal fusion for object detection in changing environments. In: International Conference on Intelligent Robots and Systems (2016)
Google Scholar
Melekhov, I., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. arXiv:1702.01381 (2017)
Misra, I., Shrivastava, A., Gupta, A., Hebert, M.: Cross-stitch networks for multi-task learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518 (2015)
Google Scholar
Naseer, T., Oliveira, G., Brox, T., Burgard, W.: Semantics-aware visual localization under challenging perceptual conditions. In: International Conference on Robotics and Automation (2017)
Google Scholar
Oliveira, G., Burgard, W., Brox, T.: Efficient deep models for monocular road segmentation. In: International Conference on Intelligent Robots and Systems (2016)
Google Scholar
Oliveira, G., Radwan, N., Burgard, W., Brox, T.: Topometric localization with deep learning. arXiv:1706.08775 (2017)
Oliveira, G., Valada, A., Bollen, C., Burgard, W., Brox, T.: Deep learning for human part discovery in images. In: International Conference on Robotics and Automation (2016)
Google Scholar
Pinto, L., Gupta, A.: Learning to push by grasping: Using multiple tasks for effective learning. In: International Conference on Robotics and Automation (2017)
Google Scholar
Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv:1603.01249 (2016)
Valada, A., Oliveira, G., Brox, T., Burgard, W.: Deep multispectral semantic scene understanding of forested environments using multimodal fusion. In: International Symposium on Experimental Robotics (2016)
Google Scholar
Valada, A., Spinello, L., Burgard, W.: Deep feature learning for acoustic-based terrain classification. In: International Symposium on Robotics Research (2015)
Google Scholar
Valada, A., Vertens, J., Dhall, A., Burgard, W.: Adapnet: adaptive semantic segmentation in adverse environmental conditions. In: International Conference on Robotics and Automation (2017)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Vertens, J., Valada, A., Burgard, W.: Smsnet: semantic motion segmentation using deep convolutional neural networks. In: International Conference on Intelligent Robots and Systems (2017)
Google Scholar
Walch, F., Hazirbas, C., Leal-Taix, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: International Conference on Computer Vision (2017)
Google Scholar
Wang, S., Clark, R., Wen, H., Trigoni, N.: Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: International Confonference on Robotics and Automation (2017)
Google Scholar
Yang, Y., Hospedales, T.M.: Trace norm regularised deep multi-task learning. arXiv:1606.04038 (2016)
Yin, X., Liu, X.: Multi-task convolutional neural network for face recognition. arXiv:1702.04710 (2017)
Zhang, J., Springenberg, J.T., Boedecker, J., Burgard, W.: Deep reinforcement learning with successor features for navigation across similar environments. In: International Conference on Intelligent Robots and Systems (2017)
Google Scholar
Zhang, J., Tai, L., Boedecker, J., Burgard, W., Liu, M.: Neural slam. arXiv:1706.09520 (2017)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: European Conference on Computer Vision (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
Wolfram Burgard, Abhinav Valada, Noha Radwan, Tayyab Naseer, Jingwei Zhang, Johan Vertens, Oier Mees, Andreas Eitel & Gabriel Oliveira

Authors

Wolfram Burgard
View author publications
You can also search for this author in PubMed Google Scholar
Abhinav Valada
View author publications
You can also search for this author in PubMed Google Scholar
Noha Radwan
View author publications
You can also search for this author in PubMed Google Scholar
Tayyab Naseer
View author publications
You can also search for this author in PubMed Google Scholar
Jingwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Johan Vertens
View author publications
You can also search for this author in PubMed Google Scholar
Oier Mees
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Eitel
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfram Burgard .

Editor information

Editors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Nancy M. Amato
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Greg Hager
Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
Shawna Thomas
Department of Electrical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
Miguel Torres-Torriti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burgard, W. et al. (2020). Perspectives on Deep Multimodel Robot Learning. In: Amato, N., Hager, G., Thomas, S., Torres-Torriti, M. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-030-28619-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-28619-4_3
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28618-7
Online ISBN: 978-3-030-28619-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics