Abstract
Audition is often described by physiologists as the most important sense in humans, due to its essential role in communication and socialization. But quite surprisingly, the interest of this modality for robotics arose only in the 2000s, brought to evidence by cognitive robotics and Human–robot interaction. Since then, numerous contributions have been proposed to the field of robot audition, ranging from sound localization to scene analysis. Binaural approaches were investigated first, then became forsaken due to mixed results. Nevertheless, the last years have witnessed a renewal of interest in binaural active audition, that is, in the opportunities and challenges opened by the coupling of binaural sensing and robot motion. This chapter proposes a comprehensive state of the art of binaural approaches to robot audition. Though the literature on binaural audition and, more generally, on acoustics and signal processing, is a fundamental source of knowledge, the tasks, constraints, and environments of robotics raise original issues. These are reviewed, prior to the most prominent contributions, platforms and projects. Two lines of research in binaural active audition, conducted by the current authors, are then outlined, one of which is tightly connected to psychology of perception.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
HRI-JP audition for robots with Kyoto University, http://winnie.kuis.kyoto-u.ac.jp/HARK/. In Ariel’s Song, The Tempest, from Shakespeare, hark is an ancient english word for listen.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
References
J. Aloimonos, I. Weiss, and A. Bandyopadhyay. Active vision. Intl. J. Computer Vision, 1:333–356, 1988.
S. Argentieri and P. Danès. Broadband variations of the MUSIC high-resolution method for sound source localization in robotics. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2007, pages 2009–2014, 2007.
E. Arnaud, H. Christensen, Y.-C. Lu, J. Barker, V. Khalidov, M. Hansard, B. Holveck, H. Mathieu, R. Narasimha, E. Taillant, F. Forbes, and R. Horaud. The CAVA corpus: Synchronised stereoscopic and binaural datasets with head movements. In ACM/IEEE Intl. Conf. Multimodal, Interfaces, ICMI’08, 2008.
M. Aytekin, C. Moss, and J. Simon. A sensorimotor approach to sound localization. Neural Computation, 20:603–635, 2008.
P. Azad, T. Gockel, R. Dillmann. Computer Vision: Principles and Practice. Elektor, Electronics, 2008.
R. Bajcsy. Active perception. Proc. of the IEEE, 76:966–1005, 1988.
Y. Bar-Shalom and X. Li. Estimation and Tracking: Principles, Techniques and Software. Artech House, 1993.
M. Bernard, S. N’Guyen, P. Pirim, B. Gas, and J.-A. Meyer. Phonotaxis behavior in the artificial rat Psikharpax. In Intl. Symp. Robotics and Intelligent Sensors, IRIS’2010, pages 118–122, Nagoya, Japan, 2010.
M. Bernard, P. Pirim, A. de Cheveigné, and B. Gas. Sensorimotor learning of sound localization from an auditory evoked behavior. In IEEE Intl. Conf. Robotics and Automation, ICRA’2012, pages 91–96, St. Paul, MN, 2012.
J. Blauert, D. Kolossa, K. Obermayer, and K. Adiloglu. Further challenges and the road ahead. In J. Blauert, editor, The technology of binaural listening, chapter 18. Springer, Berlin-Heidelberg-New York NY, 2013.
W. Brimijoin, D. Mc Shefferty, and M. Akeroyd. Undirected head movements of listeners with asymmetrical hearing impairment during a speech-in-noise task.Hearing Research, 283:162–8, 2012.
R. Brooks, C. Breazeal, N. Marjanović, B. Scassellati, and M. Williamson. The Cog project: Building a humanoid robot. In C. Nehaniv, editor, Computations for Metaphors, Analogy, and Agents, volume 1562 of LNCS, pages 52–87. Springer, 1999.
Y. Chen and Y. Rui. Real-time speaker tracking using particle filter sensor fusion. Proc. of the IEEE, 920:485–494, 2004.
H. Christensen and J. Barker. Using location cues to track speaker changes from mobile binaural microphones. In Interspeech 2009, Brighton, UK, 2009.
H. Christensen, J. Barker, Y.-C. Lu, J. Xavier, R. Caseiro, and H. Arafajo. POPeye: Real-time binaural sound-source localisation on an audio-visual robot head. In Conf. Natural Computing and Intelligent Robotics, 2009.
Computing Community Consortium. A roadmap for US robotics. From Internet to Robotics, 2009. http://www.us-robotics.us/reports/CCC%20Report.pdf.
M. Cooke, Y. Lu, Y. Lu, and R. Horaud. Active hearing, active speaking. In Intl. Symp. Auditory and Audiological Res., 2007.
M. Cooke, A. Morris, and P. Green. Recognizing occluded speech. In Proceedings of the ESCA Tutorial and Res.arch Worksh. Auditory Basis of Speech Perception, pages 297–300, Keele University, United Kingdom, 1996.
M. Cooke, A. Morris, and P. Green. Missing data techniques for robust speech recognition. In Intl. Conf. Acoustics, Speech, and Signal Processing, ICASSP’1997, pages 863–866, Munich, Germany, 1997.
B. Cornelis, M. Moonen, and J. Wouters. Binaural voice activity detection for MWF-based noise reduction in binaural hearing aids. In European Signal Processing Conf., EUSIPCO’2011, pages Barcelona, Spain, 2011.
P. Danès and J. Bonnal. Information-theoretic detection of broadband sources in a coherent beamspace MUSIC scheme. In IEEE/RSJ Intl. Conf. Intell. Robots and Systems, IROS’2010, pages 1976–1981, Taipei, Taiwan, 2010.
A. Deleforge and R. Horaud. Learning the direction of a sound source using head motions and spectral features. Technical Report 7529, INRIA, 2011.
A. Deleforge and R. Horaud. The Cocktail-Party robot: Sound source separation and localisation with an active binaural head. In IEEE/ACM Intl. Conf. Human Robot Interaction, HRI’2012, Boston, MA, 2012.
J. Gibson. The Ecological Approach to Visual Perception. Erlbaum, 1982.
M. Giuliani, C. Lenz, T. Müller, M. Rickert, and A. Knoll. Design principles for safety in human-robot interaction. Intl. J. Social Robotics, 2:253–274, 2010.
A. Handzel, S. Andersson, M. Gebremichael, and P. Krishnaprasad. A biomimetic apparatus for sound-source localization. In IEEE Conf. Decision and Control, CDC’2003, volume 6, pages 5879–5884, Maui, HI, 2003.
A. Handzel and P. Krishnaprasad. Biomimetic sound-source localization. IEEE Sensors J., 2:607–616, 2002.
S. Hashimoto, S. Narita, H. Kasahara, A. Takanishi, S. Sugano, K. Shirai, T. Kobayashi, H. Takanobu, T. Kurata, K. Fujiwara, T. Matsuno, T. Kawasaki, K. Hoashi. Humanoid robot-development of an information assistant robot, Hadaly. In IEEE Intl. Worksh. Robot and Human, Communication, RO-MAN’1997, pages 106–111, 1997.
J. Hörnstein, M. Lopes, J. Santos-victor, and F. Lacerda. Sound localization for humanoid robots - building audio-motor maps based on the HRTF. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2006, pages 1170–1176, Beijing, China, 2006.
J. Huang, T. Supaongprapa, I. Terakura, F. Wang, N. Ohnishi, and N. Sugie. A model-based sound localization system and its application to robot navigation. Robotics and Autonomous Syst., 270:199–209, 1999.
G. Ince, K. Nakadai, T. Rodemann, Y. Hasegawa, H. Tsujino, and J. Imura. Ego noise suppression of a robot using template subtraction. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2009, pages 199–204, Saint Louis, MO, 2009.
G. Ince, K. Nakadai, T. Rodemann, J. Imura, K. Nakamura, and H. Nakajima. Incremental learning for ego noise estimation of a robot. InIEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2011, pages 131–136, San Francisco, CA, 2011.
G. Ince, K. Nakadai, T. Rodemann, H. Tsujino, and J. Imura. Multi-talker speech recognition under ego-motion noise using missing feature theory. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2010, pages 982–987, Taipei, Taiwan, 2010.
R. Irie. Multimodal sensory integration for localization in a humanoid robot. In IJCAI Worksh. Computational Auditory Scene Analysis, pages 54–58, Nagoya, Aichi, Japan, 1997.
A. Ito, T. Kanayama, M. Suzuki, and S. Makino. Internal noise suppression for speech recognition by small robots. In Interspeech’2005, pages 2685–2688, Lisbon, Portugal, 2005.
M. Ji, S. Kim, H. Kim, K. Kwak, and Y. Cho. Reliable speaker identification using multiple microphones in ubiquitous robot companion environment. In IEEE Intl. Conf. Robot & Human Interactive Communication, RO-MAN’2007, pages 673–677, Jeju Island, Korea, 2007.
H.-D. Kim, J. Kim, K. Komatani, T. Ogata, and H. Okuno. Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2008, pages 1705–1711, Nice, France, 2008.
C. Knapp and G. Carter. The generalized correlation method for estimation of time delay. IEEE Trans. Acoustics, Speech and, Signal Processing, 24:320–327, 1976.
C. Knapp and G. Carter. Time delay estimation in the presence of relative motion. In IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, ICASSP’1977, pages 280–283, Storrs, CT, 1977.
Y. Kubota, M. Yoshida, K. Komatani, T. Ogata, and H. Okuno. Design and implementation of a 3D auditory scene visualizer: Towards auditory awareness with face tracking. In IEEE Intl. Symp. Multimedia, ISM’2008, pages 468–476, Berkeley, CA, 2008.
M. Kumon and Y. Noda. Active soft pinnae for robots. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2011, pages 112–117, San Francisco, CA, 2011.
M. Kumon, R. Shimoda, and Z. Iwai. Audio servo for robotic systems with pinnae. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2005, pages 885–890, Edmonton, Canada, 2005.
S. Kurotaki, N. Suzuki, K. Nakadai, H. Okuno, and H. Amano. Implementation of active direction-pass filter on dynamically reconfigurable processor. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2005, pages 3175–3180, Edmonton, Canada, 2005.
Q. Lin, E. E. Jan, and J. Flanagan. Microphone arrays and speaker identification. IEEE Trans. Speech and Audio Processing, 2:622–629, 1994.
R. Lippmann and B. A. Carlson. Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering, and noise. In Eurospeech’1997, pages 863–866, Rhodos, Greece, 1997.
Y.-C. Lu and M. Cooke. Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Comm., 53:622–642, 2011.
V. Lunati, J. Manhès, and P. Danès. A versatile system-on-a-programmable-chip for array processing and binaural robot audition. InIEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2012, pages 998–1003, Vilamoura, Portugal, 2012.
D. Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Feeeman, W.H., 1982.
E. Martinson and B. Fransen. Dynamically reconfigurable microphone arrays. In IEEE Intl. Conf. Robotics and Automation, ICRA’2011, pages 5636–5641, Shangai, China, 2011.
Y. Matsusaka, T. Tojo, S. Kubota, K. Furukawa, D. Tamiya, K. Hayata, Y. Nakano, and T. Kobayashi. Multi-person conversation via multi-modal interface - a robot who communicate with multi-user -. In Eurospeech’1999, pages 1723–1726, Budapest, Hungary, 1999.
T. May, S. van de Par, and A. Kohlrausch. Binaural localization and detection of speakers in complex acoustic scenes. In J. Blauert, editor, The Technology of Binaural Listening, chapter 15. Springer, Berlin-Heidelberg-New York NY, 2013.
F. Michaud, C. Côté, D. Létourneau, Y. Brosseau, J.-M. Valin, E. Beaudry, C. Raïevsky, A. Ponchon, P. Moisan, P. Lepage, Y. Morin, F. Gagnon, P. Giguère, M.-A. Roux, S. Caron, P. Frenette, and F. Kabanza. Spartacus attending the 2005 AAAI conference.Autonomous Robots, 22:369–383, 2007.
K. Nakadai, T. Lourens, H. Okuno, and H. Kitano. Active audition for humanoids. In Nat. Conf. Artificial Intelligence, AAAI-2000, pages 832–839, Austin, TX, 2000.
K. Nakadai, T. Matsui, H. Okuno, and H. Kitano. Active audition system and humanoid exterior design. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2000, pages 1453–1461, Takamatsu, Japan, 2000.
K. Nakadai, D. Matsuura, H. Okuno, and H. Kitano. Applying scattering theory to robot audition system: Robust sound source localization and extraction. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2003, pages 1147–1152, Las Vegas, NV, 2003.
K. Nakadai, H. Okuno, and H. Kitano. Epipolar geometry based sound localization and extraction for humanoid audition. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2001, volume 3, pages 1395–1401, Maui, HI, 2001.
K. Nakadai, H. Okuno, and H. Kitano. Auditory fovea based speech separation and its application to dialog system. In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, IROS’2002, volume 2, pages 1320–1325, Lausanne, Switzerland, 2002.
K. Nakadai, H. Okuno, and H. Kitano. Robot recognizes three simultaneous speech by active audition. In IEEE Intl. Conf. Robotics and Automation, ICRA’2003, volume 1, pages 398–405, Taipei, Taiwan, 2003.
H. Nakajima, K. Kikuchi, T. Daigo, Y. Kaneda, K. Nakadai, and Y. Hasegawa. Real-time sound source orientation estimation using a 96 channel microphone array. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2009, pages 676–683, Saint Louis, MO, 2009.
H. Nakashima and T. Mukai. 3D sound source localization system based on learning of binaural hearing. In IEEE Intl. Conf. Systems, Man and Cybernetics, SMC’2005, pages 3534–3539, Nagoya, Japan, 2005.
E. Nemer, R. Goubran, and S. Mahmoud. Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. Speech and Audio Processing, 9:217–231, 2001.
Y. Nishimura, M. Nakano, K. Nakadai, H. Tsujino, and M. Ishizuka. Speech recognition for a robot under its motor noises by selective application of missing feature theory and MLLR. In ISCA Tutorial and Research Worksh. Statistical and Perceptual Audition, Pittsburgh, PA, 2006.
H. Okuno, T. Ogata, K. Komatani, and K. Nakadai. Computational auditory scene analysis and its application to robot audition. In IEEE Intl. Conf. Informatics Res. for Development of Knowledge Society Infrastructure, ICKS’2004, pages 73–80, 2004.
J. O’Regan. How to build a robot that is conscious and feels. Minds and Machines, pages 117–136, 2012.
J. O’Regan and A. Noë. A sensorimotor account of vision and visual consciousness. Behavioral and brain sciences, 24:939–1031, 2001.
D. Philipona and J. K. O’Regan. Is there something out there? inferring space from sensorimotor dependencies. Neural Computation, 15:2029–2049, 2001.
B. Pierce, T. Kuratate, A. Maejima, S. Morishima, Y. Matsusaka, M. Durkovic, K. Diepold, and G. Cheng. Development of an integrated multi-modal communication robotic face. In IEEE Worksh. Advanced Robotics and its Social Impacts, RSO’2012, pages 101–102, Munich, Germany, 2012.
H. Poincaré. L’espace et la géométrie. Revue de Métaphysique et de Morale, pages 631–646, 1895.
A. Portello, P. Danès, and S. Argentieri. Active binaural localization of intermittent moving sources in the presence of false meaurements. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2012, pages 3294–3299, Vilamoura, Portugal, 2012.
R. Prasad, H. Saruwatari, and K. Shikano. Enhancement of speech signals separated from their convolutive mixture by FDICA algorithm. Digital Signal Processing, 19:127–133, 2009.
L. Rabiner and M. Sambur. An algorithm for determining the endpoints of isolated utterances. The Bell System Techn. J., 54:297–315, 1975.
B. Raj, R. Singh, and R. Stern. Inference of missing spectrographic features for robust speech recognition. In Intl. Conf. Spoken Language Processing, Sydney, Australia, 1998.
B. Raj and R. M. Stern. Missing-feature approaches in speech recognition. IEEE Signal Processing Mag., 22:101–116, 2005.
T. Rodemann. A study on distance estimation in binaural sound localization. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2010, pages 425–430, Taipei, Taiwan, 2010.
D. Rosenthal and H. Okuno, editors. Computational Auditory Scene Analysis. Lawrence Erlbaum Associates, 1997.
A. Saxena and A. Ng. Learning sound location from a single microphone. In IEEE Intl. Conf. Robotics and Automation, ICRA’2009, pages 1737–1742, Kobe, Japan, 2009.
S. Schulz and T. Herfet. Humanoid separation of speech sources in reverberant environments. In Intl. Symp. Communications, Control and Signal Processing, ISCCSP’2008, pages 377–382, Brownsville, TX, 2008.
M. L. Seltzer, B. Raj, and R. Stern. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Comm., 43:379–393, 2004.
A. Skaf and P. Danès. Optimal positioning of a binaural sensor on a humanoid head for sound source localization. In IEEE Intl. Conf. Humanoid Robots, Humanoids’2011, pages 165–170, Bled, Slovenia, 2011.
D. Sodoyer, B. Rivet, L. Girin, C. Savariaux, J.-L. Schwartz, and C. Jutten. A study of lip movements during spontaneous dialog and its application to voice activity detection. J. Acoust. Soc. Am., 125:1184–1196, 2009.
M. Stamm and M. Altinsoy. Employing binaural-proprioceptive interaction in human machine interfaces. In J. Blauert, editor, The technology of binaural listening, chapter 17. Springer, Berlin-Heidelberg-New York NY, 2013.
R. Takeda, S. Yamamoto, K. Komatani, T. Ogata, and H. Okuno. Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2006, pages 878–885, Beijing, China, 2006.
K. Tanaka, M. Abe, and S. Ando. A novel mechanical cochlea “fishbone” with dual sensor/actuator characteristics. IEEE/ASME Trans. Mechatronics, 3:98–105, 1998.
J. Valin, J. Rouat, and F. Michaud. Enhanced robot audition based on microphone array source separation with post-filter. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2004, pages 2123–2128, Sendai, Japan, 2004.
H. Van Trees. Optimum Array Processing (Detection, Estimation, and Modulation Theory, Part IV). Wiley-Interscience, 2002.
D. Ward, E. Lehmann, and R. Williamson. Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Trans. Speech and Audio Processing, 11:826–836, 2003.
E. Weinstein and A. Weiss. Fundamental limitations in passive time delay estimation - Part II: Wideband systems. IEEE Trans. Acoustics, Speech and Signal Processing, pages 1064–1078, 1984.
A. Weiss and E. Weinstein. Fundamental limitations in passive time delay estimation - Part I: Narrowband systems. IEEE Trans. Acoustics, Speech and Signal Processing, pages 472–486, 1983.
R. Weiss, M. Mandel, and D. Ellis. Combining localization cues and source model constraints for binaural source separation. Speech Comm., 53:606–621, 2011.
R. Woodworth and H. Schlosberg. Experimental Psychology. Holt, Rinehart and Winston, 3rd edition, 1971.
T. Yoshida and K. Nakadai. Two-layered audio-visual speech recognition for robots in noisy environments. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2010, pages 988–993, 2010.
K. Youssef, S. Argentieri, and J. Zarader. From monaural to binaural speaker recognition for humanoid robots. In IEEE/RAS Intl. Conf. Humanoid Robots, Humanoids’2010, pages 580–586, Nashville, TN, 2010.
K. Youssef, S. Argentieri, and J.-L. Zarader. A binaural sound source localization method using auditive cues and vision. In IEEE Intl. Conf. Acoustics, Speech and Signal Processing, ICASSP’2012, pages 217–220, Kyoto, Japan, 2012.
K. Youssef, S. Argentieri, and J.-L. Zarader. Towards a systematic study of binaural cues. In IEEE/RSJ Intl. Conf. Intelligent Robots and Systems, IROS’2012, pages 1004–1009, Vilamoura, Portugal, 2012.
K. Youssef, B. Breteau, S. Argentieri, J.-L. Zarader, and Z. Wang. Approaches for automatic speaker recognition in a binaural humanoid context. In Eur. Symp. Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN’2011, pages 411–416, Bruges, Belgium, 2011.
Acknowledgments
This work was conducted within the project binaural active audition for humanoid robots, BINAAHR, funded under contract # ANR-09-BLAN-0370-02 by ANR, France, and JST, Japan. The authors would like to thank two anonymous reviewers for valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Argentieri, S., Portello, A., Bernard, M., Danès, P., Gas, B. (2013). Binaural Systems in Robotics. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-37762-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37761-7
Online ISBN: 978-3-642-37762-4
eBook Packages: EngineeringEngineering (R0)