Skip to main content

Learning to Listen to Your Ego-(motion): Metric Motion Estimation from Auditory Signals

  • Conference paper
  • First Online:
Towards Autonomous Robotic Systems (TAROS 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10965))

Included in the following conference series:

Abstract

This paper is about robot ego-motion estimation relying solely on acoustic sensing. By equipping a robot with microphones, we investigate the possibility of employing the noise generated by the motors and actuators of the vehicle to estimate its motion. Audio-based odometry is not affected by the scene’s appearance, lighting conditions, and structure. This makes sound a compelling auxiliary source of information for ego-motion modelling in environments where more traditional methods, such as those based on visual or laser odometry, are particularly challenged. By leveraging multi-task learning and deep architectures, we provide a regression framework able to estimate the linear and the angular velocity at which the robot has been travelling. Our experimental evaluation conducted on approximately two hours of data collected with an unmanned outdoor field robot demonstrated an absolute error lower than 0.07 m/s and 0.02 rad/s for the linear and angular velocity, respectively. When compared to a baseline approach, making use of single-task learning scheme, our system shows an improvement of up to 26% in the ego-motion estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nistér, D., Naroditsky, O., Bergen, J.: Visual odometry for ground vehicle applications. J. Field Robot. 23(1), 3–20 (2006)

    Article  Google Scholar 

  2. Maimone, M., Cheng, Y., Matthies, L.: Two years of visual odometry on the mars exploration rovers. J. Field Robot. 24(3), 169–186 (2007)

    Article  Google Scholar 

  3. Nicolai, A., Skeele, R., Eriksen, C., Hollinger, G.A.: Deep learning for laser based odometry estimation. In: Robotics: Science and Systems, Workshop on Limits and Potentials of Deep Learning in Robotics (2016)

    Google Scholar 

  4. Konda, K.R., Memisevic, R.: Learning visual odometry with a convolutional network. In: VISAPP (1), pp. 486–490 (2015)

    Google Scholar 

  5. Kendall, A., Grimes, M., Cipolla, R.: PoseNET: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)

    Google Scholar 

  6. Marchegiani, M.L., Pirri, F., Pizzoli, M.: Multimodal speaker recognition in a conversation scenario. In: Fritz, M., Schiele, B., Piater, J.H. (eds.) ICVS 2009. LNCS, vol. 5815, pp. 11–20. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04667-4_2

    Chapter  Google Scholar 

  7. Marchegiani, L., Posner, I.: Leveraging the urban soundscape: auditory perception for smart vehicles. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 6547–6554. IEEE (2017)

    Google Scholar 

  8. Maxime, J., Alameda-Pineda, X., Girin, L., Horaud, R.: Sound representation and classification benchmark for domestic robots. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 6285–6292. IEEE (2014)

    Google Scholar 

  9. Valada, A., Spinello, L., Burgard, W.: Deep feature learning for acoustics-based terrain classification. In: Bicchi, A., Burgard, W. (eds.) Robotics Research: Volume 2. SPAR, vol. 3, pp. 21–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60916-4_2

    Chapter  Google Scholar 

  10. Bando, Y., et al.: Sound-based online localization for an in-pipe snake robot. In: 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 207–213. IEEE (2016)

    Google Scholar 

  11. Pico, A., Schillaci, G., Hafner, V.V., Lara, B.: How do i sound like? Forward models for robot ego-noise prediction. In: 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 246–251. IEEE (2016)

    Google Scholar 

  12. Scaramuzza, D., Fraundorfer, F.: Visual odometry [tutorial]. IEEE Robot. Autom. Mag. 18(4), 80–92 (2011)

    Article  Google Scholar 

  13. Rosten, E., Reitmayr, G., Drummond, T.: Real-time video annotations for augmented reality. In: Bebis, G., Boyle, R., Koracin, D., Parvin, B. (eds.) ISVC 2005. LNCS, vol. 3804, pp. 294–302. Springer, Heidelberg (2005). https://doi.org/10.1007/11595755_36

    Chapter  Google Scholar 

  14. Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: BRIEF: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2012)

    Article  Google Scholar 

  15. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  16. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)

    Google Scholar 

  17. Jin, F., Sun, S.: Neural network multitask learning for traffic flow forecasting. In: 2008 IEEE International Joint Conference on Neural Networks, IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp. 1897–1901. IEEE (2008)

    Google Scholar 

  18. Caruana, R.: Multitask learning. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 95–133. Springer, Boston (1998). https://doi.org/10.1007/978-1-4615-5529-2_5

    Chapter  Google Scholar 

  19. Chakrabarty, D., Elhilali, M.: Abnormal sound event detection using temporal trajectories mixtures. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 216–220. IEEE (2016)

    Google Scholar 

  20. Holdsworth, J., Nimmo-Smith, I., Patterson, R., Rice, P.: Implementing a gammatone filter bank. Annex C SVOS Final Rep.: Part A: Audit. Filterbank 1, 1–5 (1988)

    Google Scholar 

  21. Marchegiani, L., Karadogan, S.G., Andersen, T., Larsen, J., Hansen, L.K.: The role of top-down attention in the cocktail party: revisiting cherry’s experiment after sixty years. In: 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA), vol. 1, pp. 183–188. IEEE (2011)

    Google Scholar 

  22. Toshio, I.: An optimal auditory filter. In: 1995 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 198–201. IEEE (1995)

    Google Scholar 

  23. Glasberg, B.R., Moore, B.C.: Derivation of auditory filter shapes from notched-noise data. Hear. Res. 47(1), 103–138 (1990)

    Article  Google Scholar 

  24. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Published as a Conference Paper at the 3rd International Conference for Learning Representations (ICLR) 2015

  25. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)

  26. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015) Software available from tensorflow.org

  27. Deng, S., Han, J., Zhang, C., Zheng, T., Zheng, G.: Robust minimum statistics project coefficients feature for acoustic environment recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8232–8236. IEEE (2014)

    Google Scholar 

  28. Takahashi, N., Gygli, M., Van Gool, L.: AENet: learning deep audio features for video analysis. arXiv preprint arXiv:1701.00599 (2017)

Download references

Acknowledgements

This work was supported by the UK EPSRC Programme Grant EP/M019918/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Letizia Marchegiani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marchegiani, L., Newman, P. (2018). Learning to Listen to Your Ego-(motion): Metric Motion Estimation from Auditory Signals. In: Giuliani, M., Assaf, T., Giannaccini, M. (eds) Towards Autonomous Robotic Systems. TAROS 2018. Lecture Notes in Computer Science(), vol 10965. Springer, Cham. https://doi.org/10.1007/978-3-319-96728-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96728-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96727-1

  • Online ISBN: 978-3-319-96728-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics