Autonomous Robots

, Volume 43, Issue 8, pp 2293–2317 | Cite as

Motion planning for robot audition

  • Quan V. NguyenEmail author
  • Francis Colas
  • Emmanuel Vincent
  • François Charpillet


Robot audition refers to a range of hearing capabilities which help robots explore and understand their environment. Among them, sound source localization is the problem of estimating the location of a sound source given measurements of its angle of arrival with respect to a microphone array mounted on the robot. In addition, robot motion can help quickly solve the front-back ambiguity existing in a linear microphone array. In this article, we focus on the problem of exploiting robot motion to improve the estimation of the location of an intermittent and possibly moving source in a noisy and reverberant environment. We first propose a robust extended mixture Kalman filtering framework for jointly estimating the source location and its activity over time. Building on this framework, we then propose a long-term robot motion planning algorithm based on Monte Carlo tree search to find an optimal robot trajectory according to two alternative criteria: the Shannon entropy or the standard deviation of the estimated belief on the source location. These criteria are integrated over time using a discount factor. Experimental results show the robustness of the proposed estimation framework to false angle of arrival measurements within \(\pm \,20^{\circ }\) and 10% false source activity detection rate. The proposed robot motion planning technique achieves an average localization error 48.7% smaller than a one-step-ahead method. In addition, we compare the correlation between the estimation error and the two criteria, and investigate the effect of the discount factor on the performance of the proposed motion planning algorithm.


Robot audition Motion planning Sound source localization Extended mixture Kalman filter Monte Carlo tree search 



Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see


  1. Alam, J., Kenny, P., Ouellet, P., Stafylakis, T., & Dumouchel, P. (2014). Supervised/unsupervised voice activity detectors for text-dependent speaker recognition on the RSR2015 corpus. In Proceedings of Odyssey.Google Scholar
  2. Ali, A. M., Asgari, S., Collier, T. C., Allen, M., Girod, L., Hudson, R. E., et al. (2009). An empirical study of collaborative acoustic source localization. Journal of Signal Processing Systems, 57(3), 415–436.CrossRefGoogle Scholar
  3. Allen, J. B., & Berkley, D. A. (1979). Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4), 943–950.CrossRefGoogle Scholar
  4. Amanatiadis, A. A., Chatzichristofis, S. A., Charalampous, K., Doitsidis, L., Kosmatopoulos, E. B., Tsalides, P., et al. (2013). A multi-objective exploration strategy for mobile robots under operational constraints. IEEE Access, 1, 691–702.CrossRefGoogle Scholar
  5. Badali, A., Valin, J. M., Michaud, F., & Aarabi, P. (2009). Evaluating real-time audio localization algorithms for artificial audition in robotics. In Proceedings of the IROS (pp. 2033–2038).Google Scholar
  6. Berglund, E., & Sitte, J. (2005). Sound source localisation through active audition. In Proceedings of the IROS (pp. 509–514).Google Scholar
  7. Bhattacharyya, S. (2011). Motion planning and constraint exploration for robotic surgery. Nashville: Vanderbilt University.Google Scholar
  8. Blandin, C., Ozerov, A., & Vincent, E. (2012). Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing, 92(8), 1950–1960.CrossRefGoogle Scholar
  9. Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., et al. (2012). A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 1–43.CrossRefGoogle Scholar
  10. Bustamante, G., & Danès, P. (2017). Multi-step-ahead information-based feedback control for active binaural localization. In Proceedings of the IROS.Google Scholar
  11. Bustamante, G., Danès, P., Forgue, T., & Podlubne, A. (2016). Towards information-based feedback control for binaural active localization. In Proceedings of the ICASSP (pp. 6325–6329).Google Scholar
  12. Bustamante, G., Danès, P., Forgue, T., Podlubne, A., & Manhès, J. (2017). An information based feedback control for audio-motor binaural localization. Autonomous Robots,. Scholar
  13. Chengalvarayan, R. (1999). Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition. In Proceedings of the Eurospeech.Google Scholar
  14. Colas, F., Mahesh, S., Pomerleau, F., Liu, M., & Siegwart, R. (2013). 3D path planning and execution for search and rescue ground robots. In Proceedings of the IROS (pp. 722–727).Google Scholar
  15. Cooke, M., Lu, Y. C., Lu, Y., & Horaud, R. (2007). Active hearing, active speaking. In Proceedings of the ISAAR (pp. 33–46).Google Scholar
  16. DeJong, B. P. (2012). Auditory occupancy grids with a mobile robot. Journal of Automation, Mobile Robotics and Intelligent Systems, 6(3), 3–12.Google Scholar
  17. DiBiase, J. H., Silverman, H. F., & Brandstein, M. S. (2001). Robust localisation in reverberant rooms. In M. Brandstein & D. Ward (Eds.), Microphone arrays: Signal processing techniques and applications (pp. 157–180). Berlin: Springer. CrossRefGoogle Scholar
  18. Dolgov, D., Thrun, S., Montemerlo, M., & Diebel, J. (2008). Practical search techniques in path planning for autonomous driving. In Proceedings of the STAIR.Google Scholar
  19. Evers, C., Moore, A., & Naylor, P. (2016). Towards informative path planning for acoustic SLAM. In Proceedings of the DAGA.Google Scholar
  20. Fallon, M. F., & Godsill, S. J. (2012). Acoustic source localization and tracking of a time-varying number of speakers. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1409–1415.CrossRefGoogle Scholar
  21. Germain, F. G., Sun, D. L., & Mysore, G. J. (2013). Speaker and noise independent voice activity detection. In: Proceedings of the Interspeech.Google Scholar
  22. Girod, L., Lukac, M., Trifa, V., & Estrin, D. (2006). The design and implementation of a self-calibrating distributed acoustic sensing platform. In: Proceedings of the SenSys (pp. 71–84).Google Scholar
  23. Gonzalez-Banos, H. H., & Latombe, J. C. (2002). Navigation strategies for exploring indoor environments. The International Journal of Robotics Research, 21(10–11), 829–848.CrossRefGoogle Scholar
  24. Hahn, W., & Tretter, S. (1973). Optimum processing for delay-vector estimation in passive signal arrays. IEEE Transactions on Information Theory, 19(5), 608–614.CrossRefGoogle Scholar
  25. Hashimoto, S., Narita, S., Kasahara, H., Takanishi, A., Sugano, S., Shirai, K., Kobayashi, T., Takanobu, H., Kurata, T., Fujiwara, K., Matsuno, T., Kawasaki, T., & Hoashi, K. (1997). Humanoid robot-development of an information assistant robot hadaly. In Proceedings of the RO-MAN (pp. 106–111).Google Scholar
  26. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30.MathSciNetCrossRefGoogle Scholar
  27. Huber, M. F., Bailey, T., Durrant-Whyte, H., & Hanebeck, U. D. (2008). On entropy approximation for Gaussian mixture random vectors. In: Proceedings of the MFI (pp. 181–188).Google Scholar
  28. Johnson, D. H., & Dudgeon, D. E. (1992). Array signal processing: Concepts and techniques. New York: Simon & Schuster.zbMATHGoogle Scholar
  29. Karray, L., & Martin, A. (2003). Towards improving speech detection robustness for speech recognition in adverse conditions. Speech Communication, 40(3), 261–276.CrossRefGoogle Scholar
  30. Kim, U. H., Kim, J., Kim, D., Kim, H., & You, B. J. (2008). Speaker localization using the TDOA-based feature matrix for a humanoid robot. In Proceedings of the RO-MAN (pp. 610–615).Google Scholar
  31. Knapp, C., & Carter, G. (1976). The generalized cross-correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(4), 320–327.CrossRefGoogle Scholar
  32. Kocsis, L., Szepesvári, C., & Willemson, J. (2006). Improved Monte-Carlo search. Technical Report 1, University of Tartu.Google Scholar
  33. Latombe, J. C. (1991). Robot motion planning. Dordrecht: Kluwer.CrossRefGoogle Scholar
  34. LaValle, S. M. (2006). Planning algorithms. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  35. Lu, Y. C., & Cooke, M. (2011). Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Communication, 53(5), 622–642.CrossRefGoogle Scholar
  36. Magassouba, A. (2016). Aural servo: Towards an alternative approach to sound localization for robot motion control. Ph.D. thesis, Université Rennes 1.Google Scholar
  37. Marković, I., Portello, A., Danès, P., Petrović, I., & Argentieri, S. (2013). Active speaker localization with circular likelihoods and bootstrap filtering. In Proceedings of the IROS (pp. 2914–2920).Google Scholar
  38. Martinson, E., & Schultz, A. (2006). Auditory evidence grids. In Proceedings of the IROS (pp. 1139–1144).Google Scholar
  39. Martinson, E., & Schultz, A. (2009). Discovery of sound sources by an autonomous mobile robot. Autonomous Robots, 27, 221–237.CrossRefGoogle Scholar
  40. Marzinzik, M., & Kollmeier, B. (2002). Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing, 10(2), 109–118.CrossRefGoogle Scholar
  41. Nakadai, K., Lourens, T., Okuno, H. G., & Kitano, H. (2000). Active audition for humanoid. In Proceedings of the AAAI (pp. 832–839).Google Scholar
  42. Nakadai, K., Okuno, H. G., & Kitano, H. (2002). Real-time sound source localization and separation for robot audition. In Proceedings of the Interspeech (pp. 193–196).Google Scholar
  43. Nakadai, K., Okuno, H. G., & Kitano, H. (2003). Robot recognizes three simultaneous speech by active audition. In Proceedings of the ICRA (pp. 398–405).Google Scholar
  44. Nakadai, K., Takahashi, T., Okuno, H. G., Nakajima, H., Hasegawa, Y., & Tsujino, H. (2010). Design and implementation of robot audition system ’HARK’—Open source software for listening to three simultaneous speakers. Advanced Robotics, 24(5–6), 739–761.CrossRefGoogle Scholar
  45. Nakamura, K., Nakadai, K., & Ince, G. (2012). Real-time super-resolution sound source localization for robots. In Proceedings of the IROS (pp. 694–699).Google Scholar
  46. Nguyen, Q. V. (2018). Mapping of a sound environment by a mobile robot. Ph.D. thesis, University of Lorraine.Google Scholar
  47. Nguyen, Q. V., Colas, F., Vincent, E., & Charpillet, F. (2016). Localizing an intermittent and moving sound source using a mobile robot. In Proceedings of the IROS (pp. 61–65).Google Scholar
  48. Nguyen, Q. V., Colas, F., Vincent, E., & Charpillet, F. (2017). Long-term robot motion planning for active sound source localization with Monte Carlo tree search. In Proceedings of the HSCMA (pp 61–65).Google Scholar
  49. Okuno, H. G., & Nakadai, K. (2015). Robot audition: Its rise and perspectives. In Proceedings of the ICASSP (pp. 5610–5614).Google Scholar
  50. Popoviciu, T. (1935). Sur les équations algébriques ayant toutes leurs racines réelles. Mathematica (Cluj), 9, 129–145.zbMATHGoogle Scholar
  51. Portello, A., Bustamante, G., Danès, P., Piat, J., & Manhès, J. (2014). Active localization of an intermittent sound source from a moving binaural sensor. In Proceedings of the Forum Acusticum.Google Scholar
  52. Portello, A., Danès, P., & Argentieri, S. (2011). Acoustic models and Kalman filtering strategies for active binaural sound localization. In Proceedings of the IROS (pp. 137–142).Google Scholar
  53. Portello, A., Danès, P., & Argentieri, S. (2012). Active binaural localization of intermittent moving sources in the presence of false measurements. In Proceedings of the IROS (pp. 3294–3299).Google Scholar
  54. Ramírez, J., Górriz, J. M., & Segura, J. C. (2007). Voice activity detection Fundamentals and speech recognition system robustness. In M. Grimm & K. Kroschel (Eds.), Robust speech recognition and understanding. Vienna: Intech.Google Scholar
  55. Ramirez, J., Segura, J. C., Benitez, C., de la Torre, A., & Rubio, A. J. (2003). A new adaptive long-term spectral estimation voice activity detector. In Proceedings of the Eurospeech.Google Scholar
  56. Schmidt, R. (1986). Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3), 276–280.CrossRefGoogle Scholar
  57. Schymura, C., Grajales, J. D. R., & Kolossa, D. (2017). Monte Carlo exploration for active binaural localization. In Proceedings of the ICASSP (pp. 491–495).Google Scholar
  58. Siegwart, R., Nourbakhsh, I. R., & Scaramuzza, D. (2011). Introduction to autonomous mobile robots. Cambridge: MIT Press.Google Scholar
  59. Slotani, M. (1964). Tolerance regions for a multivariate normal population. Annals of the Institute of Statistical Mathematics, 16(1), 135–153.MathSciNetCrossRefGoogle Scholar
  60. Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.CrossRefGoogle Scholar
  61. Song, K., Liu, Q., & Wang, Q. (2011). Olfaction and hearing based mobile robot navigation for odor/sound source search. Sensors, 11, 2129–2154.CrossRefGoogle Scholar
  62. Tanyer, S. G., & Ozer, H. (2000). Voice activity detection in nonstationary noise. IEEE Transactions on Speech and Audio Processing, 8(4), 478–482.CrossRefGoogle Scholar
  63. Valin, J. M., Michaud, F., & Rouat, J. (2007). Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems, 55(3), 216–228.CrossRefGoogle Scholar
  64. Valin, J. M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., & Okuno, H. (2007). Robust recognition of simultaneous speech by a mobile robot. IEEE Transactions on Robotics, 23(4), 742–752.CrossRefGoogle Scholar
  65. Van Veen, B. D., & Buckley, K. M. (1988). Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, 5(2), 4–24.CrossRefGoogle Scholar
  66. Vermaak, J., & Blake, A. (2001). Nonlinear filtering for speaker tracking in noisy and reverberant environments. In Proceedings of the ICASSP (Vol. 5, pp. 3021–3024).Google Scholar
  67. Vincent, E., Sini, A., & Charpillet, F. (2015). Audio source localization by optimal control of a mobile robot. In Proceedings of the ICASSP (pp. 5630–5634).Google Scholar
  68. Wightman, F. L., & Kistler, D. J. (1999). Resolution of front-back ambiguity in spatial hearing by listener and source movement. The Journal of the Acoustical Society of America, 105(5), 2841–2853.CrossRefGoogle Scholar
  69. Woo, K. H., Yang, T. Y., Park, K. J., & Lee, C. (2000). Robust voice activity detection algorithm for estimating noise spectrum. IET Electronics Letters, 36(2), 180–181.CrossRefGoogle Scholar
  70. Yamauchi, B. (1997). A frontier-based approach for autonomous exploration. In Proceedings of the CIRA (pp. 146–151).Google Scholar
  71. Zhang, X. L., & Wu, J. (2013). Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 697–710.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Quan V. Nguyen
    • 1
    • 2
    Email author
  • Francis Colas
    • 1
  • Emmanuel Vincent
    • 1
  • François Charpillet
    • 1
  1. 1.Université de Lorraine, CNRS, Inria, LoriaNancyFrance
  2. 2.Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-labGrenobleFrance

Personalised recommendations