Learning to Listen to Your Ego-(motion): Metric Motion Estimation from Auditory Signals

Marchegiani, Letizia; Newman, Paul

doi:10.1007/978-3-319-96728-8_21

Letizia Marchegiani¹⁶ &
Paul Newman¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10965))

Included in the following conference series:

Annual Conference Towards Autonomous Robotic Systems

2153 Accesses
11 Citations

Abstract

This paper is about robot ego-motion estimation relying solely on acoustic sensing. By equipping a robot with microphones, we investigate the possibility of employing the noise generated by the motors and actuators of the vehicle to estimate its motion. Audio-based odometry is not affected by the scene’s appearance, lighting conditions, and structure. This makes sound a compelling auxiliary source of information for ego-motion modelling in environments where more traditional methods, such as those based on visual or laser odometry, are particularly challenged. By leveraging multi-task learning and deep architectures, we provide a regression framework able to estimate the linear and the angular velocity at which the robot has been travelling. Our experimental evaluation conducted on approximately two hours of data collected with an unmanned outdoor field robot demonstrated an absolute error lower than 0.07 m/s and 0.02 rad/s for the linear and angular velocity, respectively. When compared to a baseline approach, making use of single-task learning scheme, our system shows an improvement of up to 26% in the ego-motion estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nistér, D., Naroditsky, O., Bergen, J.: Visual odometry for ground vehicle applications. J. Field Robot. 23(1), 3–20 (2006)
Article Google Scholar
Maimone, M., Cheng, Y., Matthies, L.: Two years of visual odometry on the mars exploration rovers. J. Field Robot. 24(3), 169–186 (2007)
Article Google Scholar
Nicolai, A., Skeele, R., Eriksen, C., Hollinger, G.A.: Deep learning for laser based odometry estimation. In: Robotics: Science and Systems, Workshop on Limits and Potentials of Deep Learning in Robotics (2016)
Google Scholar
Konda, K.R., Memisevic, R.: Learning visual odometry with a convolutional network. In: VISAPP (1), pp. 486–490 (2015)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNET: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Google Scholar
Marchegiani, M.L., Pirri, F., Pizzoli, M.: Multimodal speaker recognition in a conversation scenario. In: Fritz, M., Schiele, B., Piater, J.H. (eds.) ICVS 2009. LNCS, vol. 5815, pp. 11–20. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04667-4_2
Chapter Google Scholar
Marchegiani, L., Posner, I.: Leveraging the urban soundscape: auditory perception for smart vehicles. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 6547–6554. IEEE (2017)
Google Scholar
Maxime, J., Alameda-Pineda, X., Girin, L., Horaud, R.: Sound representation and classification benchmark for domestic robots. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 6285–6292. IEEE (2014)
Google Scholar
Valada, A., Spinello, L., Burgard, W.: Deep feature learning for acoustics-based terrain classification. In: Bicchi, A., Burgard, W. (eds.) Robotics Research: Volume 2. SPAR, vol. 3, pp. 21–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60916-4_2
Chapter Google Scholar
Bando, Y., et al.: Sound-based online localization for an in-pipe snake robot. In: 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 207–213. IEEE (2016)
Google Scholar
Pico, A., Schillaci, G., Hafner, V.V., Lara, B.: How do i sound like? Forward models for robot ego-noise prediction. In: 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 246–251. IEEE (2016)
Google Scholar
Scaramuzza, D., Fraundorfer, F.: Visual odometry [tutorial]. IEEE Robot. Autom. Mag. 18(4), 80–92 (2011)
Article Google Scholar
Rosten, E., Reitmayr, G., Drummond, T.: Real-time video annotations for augmented reality. In: Bebis, G., Boyle, R., Koracin, D., Parvin, B. (eds.) ISVC 2005. LNCS, vol. 3804, pp. 294–302. Springer, Heidelberg (2005). https://doi.org/10.1007/11595755_36
Chapter Google Scholar
Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: BRIEF: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2012)
Article Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Jin, F., Sun, S.: Neural network multitask learning for traffic flow forecasting. In: 2008 IEEE International Joint Conference on Neural Networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp. 1897–1901. IEEE (2008)
Google Scholar
Caruana, R.: Multitask learning. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 95–133. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_5
Chapter Google Scholar
Chakrabarty, D., Elhilali, M.: Abnormal sound event detection using temporal trajectories mixtures. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 216–220. IEEE (2016)
Google Scholar
Holdsworth, J., Nimmo-Smith, I., Patterson, R., Rice, P.: Implementing a gammatone filter bank. Annex C SVOS Final Rep.: Part A: Audit. Filterbank 1, 1–5 (1988)
Google Scholar
Marchegiani, L., Karadogan, S.G., Andersen, T., Larsen, J., Hansen, L.K.: The role of top-down attention in the cocktail party: revisiting cherry’s experiment after sixty years. In: 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), vol. 1, pp. 183–188. IEEE (2011)
Google Scholar
Toshio, I.: An optimal auditory filter. In: 1995 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 198–201. IEEE (1995)
Google Scholar
Glasberg, B.R., Moore, B.C.: Derivation of auditory filter shapes from notched-noise data. Hear. Res. 47(1), 103–138 (1990)
Article Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Published as a Conference Paper at the 3rd International Conference for Learning Representations (ICLR) 2015
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015) Software available from tensorflow.org
Deng, S., Han, J., Zhang, C., Zheng, T., Zheng, G.: Robust minimum statistics project coefficients feature for acoustic environment recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8232–8236. IEEE (2014)
Google Scholar
Takahashi, N., Gygli, M., Van Gool, L.: AENet: learning deep audio features for video analysis. arXiv preprint arXiv:1701.00599 (2017)

Download references

Acknowledgements

This work was supported by the UK EPSRC Programme Grant EP/M019918/1.

Author information

Authors and Affiliations

Oxford Robotics Institute, University of Oxford, Oxford, UK
Letizia Marchegiani & Paul Newman

Authors

Letizia Marchegiani
View author publications
You can also search for this author in PubMed Google Scholar
Paul Newman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Letizia Marchegiani .

Editor information

Editors and Affiliations

University of the West of England, Bristol, United Kingdom
Manuel Giuliani
University of Bath, Bath, United Kingdom
Tareq Assaf
University of Bristol, Bristol, United Kingdom
Maria Elena Giannaccini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marchegiani, L., Newman, P. (2018). Learning to Listen to Your Ego-(motion): Metric Motion Estimation from Auditory Signals. In: Giuliani, M., Assaf, T., Giannaccini, M. (eds) Towards Autonomous Robotic Systems. TAROS 2018. Lecture Notes in Computer Science(), vol 10965. Springer, Cham. https://doi.org/10.1007/978-3-319-96728-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-96728-8_21
Published: 21 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96727-1
Online ISBN: 978-3-319-96728-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics