Sound and Visual Tracking by Active Audition

  • Hiroshi G. Okuno
  • Kazuhiro Nakadai
  • Tino Lourens
  • Hiroaki Kitano
Conference paper


Active perception in vision and audition is essential in robot-human interaction. The audition system of the intelligent humanoid requires localization of sound sources and identification of meanings of the sound in the auditory scene. However, the performance of these processing may be deteriorated because coupling of perception and behavior causes mechanical noises. The active audition system reported in this paper adaptively cancels motor noises by using heuristics with motor control signal, and localizes multiple sound sources. The sound and visual tracking system implemented on the SIGthe humanoid. demonstrates the effectiveness and robustness of sound and visual tracking in multiple sound source environments.


Sound Source Humanoid Robot Visual Tracking Head Direction Speaker Identification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Y. Aloimonos, I. Weiss, and A. Bandyopadhyay., “Active vision,” International Journal of Computer Vision, vol. 1, no. 4, pp. 333–356, 1987.CrossRefGoogle Scholar
  2. 2.
    M. P. Cooke, G. J. Brown, M. Crawford, and P. Green, “Computational auditory scene analysis: Listening to several things at once,” Endeavour, vol. 17, no. 4, pp. 186–190, 1993.CrossRefGoogle Scholar
  3. 3.
    T. Nakatani, H. G. Okuno, and T. Kawabata, “Auditory stream segregation in auditory scene analysis with a multi-agent system,” in Proc. of AAAI-94, pp. 100–107, AAAI, 1994.Google Scholar
  4. 4.
    D. Rosenthal and H. G. Okuno, eds., Computational Auditory Scene Analysis. Mahwah, New Jersey: Lawrence Erlbaum Associates, 1998.Google Scholar
  5. 5.
    K. Nakadai, T. Lourens, H. G. Okuno, and H. Kitano, “Active audition for humanoid,” in Pro. of AAAI-2000, pp. 832–839, AAAI, 2000.Google Scholar
  6. 6.
    H. Kitano, H. G. Okuno, K. Nakadai, I. Fermin, T. Sabish, Y. Nakagawa, and T. Matsui, “Designing a humanoid head for robocup challenge,” in Proc. of Agents 2000, ACM, 2000.Google Scholar
  7. 7.
    C. Breazeal and B. Scassellati, “A context-dependent attention system for a social robot,” in Proc. of IJCAI-99, pp. 1146–1151, 1999.Google Scholar
  8. 8.
    R. Brooks, C. Breazeal, M. Marjanovie, B. Scassellati, and M. Williamson, “The cog project: Building a humanoid robot,” in Computation for metaphors, analogy, and agents (C. Nehaniv, ed.), pp. 52–87, Spriver-Verlag, 1999.Google Scholar
  9. 9.
    Y. Matsusaka, T. Tojo, S. Kuota, K. Furukawa, D. Tamiya, K. Hayata, Y. Nakano, and T. Kobayashi, “Multi-person conversation via multimodal interface a robot who communicates with multi-user,” in Proc. of EUROSPEECH-99, pp. 1723–1726, ESCA, 1999.Google Scholar
  10. 10.
    A. Takanishi, S. Masukawa, Y. Mori, and T. Ogawa, “Development of an anthropomorphic auditory robot that localizes a sound direction (in japanese),” Bulletin of the Centre for Informatics, vol. 20, pp. 24–32, 1995.Google Scholar
  11. 11.
    G. J. Brown, Computational auditory scene analysis: A representational approach. University of Sheffield, 1992.Google Scholar
  12. 12.
    T. Nakatani and H. G. Okuno, “Harmonic sound stream segregation using localization and its application to speech stream segregation,” Speech Communication, vol. 27, no. 3–4, pp. 209–222, 1999.CrossRefGoogle Scholar
  13. 13.
    J. Huang, “Spatial sound processing for a hearing robot,” in Enabling Society with Information Technology, LNCS, This volume, Springer-Verlag, 2001.Google Scholar
  14. 14.
    Y. Nakagawa, H. G. Okuno, and H. Kitano, “Using vision to improve sound source separation,” in Proc. of AAAI-99, pp. 768–775, AAAI, 1999.Google Scholar
  15. 15.
    S. Cavaco and J. Hallam, “A biologically plausible acoustic azimuth estimation system,” in Proc. of IJCA I-99 Workshop on CASA (CASA’99), pp. 78–87, IJCAI, 1999.Google Scholar
  16. 16.
    T. Lourens, K. Nakadai, H. G. Okuno, and H. Kitano, “Selective attention by integration of vision and audition,” in Proc. of Humanoids2000, IEEE/RSJ, 2000.Google Scholar
  17. 17.
    K. Nakadai, K. Hidai, H. Mizoguchi, H. G. Okuno, and H. Kitano, “Real-time auditory and visual multiple-object tracking for robots,” in Proc. of IJCAI-01, pp. 1425–1432, 2001.Google Scholar

Copyright information

© Springer Japan 2002

Authors and Affiliations

  • Hiroshi G. Okuno
    • 1
    • 2
  • Kazuhiro Nakadai
    • 1
  • Tino Lourens
    • 1
    • 3
  • Hiroaki Kitano
    • 1
    • 4
  1. 1.Kitano Symbiotic Systems Project, ERATOJapan Science and Technolog CorpShibuya, TokyoJapan
  2. 2.Graduate School of InformaticsKyoto UniversitySakyo, KyotoJapan
  3. 3.Starlab DF-1BrusselsBelgium
  4. 4.Sony Computer Science Laboratories, Inc.Shinagawa, TokyoJapan

Personalised recommendations