Skip to main content

Speaker Localization in CHIL Lectures: Evaluation Criteria and Results

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3869))

Included in the following conference series:

Abstract

This work addresses the problem of automatic speaker localization and tracking in a real lecture scenario. Evaluation criteria recently adopted under CHIL and NIST benchmarking are outlined. Two speaker localization systems are described, which are based on the use of Generalized Cross Correlation Phase Transform analysis and Global Coherence Field. Benchmarking results, obtained on a set of 13 lectures, showed an average RMS error of about 30 cm in the speaker localization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brandstein, M., Ward, D.: Microphone Arrays. Springer, Heidelberg (2001)

    Book  Google Scholar 

  2. Knapp, C.H., Carter, C.: The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. on ASSP 24, 320–327 (1976)

    Article  Google Scholar 

  3. Omologo, M., Svaizer, P.: Acoustic Event Localization using a Crosspower-Spectrum Phase based Techniques. Proc. IEEE ICASSP 2, 273–276 (Adelaide 1994)

    Google Scholar 

  4. De Mori, R.: Spoken Dialogues with Computers, ch. 2. Academic Press, London (1998)

    Google Scholar 

  5. Rabinkin, D.V., Ranomeron, R.J., French, J.C., Flanagan, J.L.: A DSP Implementation of Source Location using Microphone Arrays. In: Proc. of SPIE, vol. 2846 (1996)

    Google Scholar 

  6. Wang, H., Chu, P.: Voice Source Localization for Automatic Camera Pointing System in Videoconferencing. In: Proc. of ICASSP (1997)

    Google Scholar 

  7. Huang, Y.A., Benesty, J., Elko, G.W.: Microphone Arrays for Video Camera Steering. In: Gay, S.L., Benesty, J. (eds.) Acoustic Signal Processing for Telecommunication. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  8. Silverman, H.F., et al.: Performance of Real-Time Source Location Estimators for a Large-Aperture Microphone Array. IEEE Trans. on SAP 13(4) (2005)

    Google Scholar 

  9. Van Trees, H.L.: Optimum Array Processing-Part IV. John Wiley & Sons, Chichester (2002)

    Book  Google Scholar 

  10. Omologo, M., Svaizer, P.: Use of the Crosspower-Spectrum Phase in Acoustic Event Location. IEEE Trans. on SAP 5(3), 288–292 (May 1997)

    Google Scholar 

  11. Omologo, M., Svaizer, P.: Acoustic Source Localization in Noisy and Reverberant Environment using CSP Analysis. In: Proc. IEEE ICASSP (1996)

    Google Scholar 

  12. Chen, J., Benesty, J., Huang, Y.: Robust Time Delay Estimation exploting Redundancy among Multiple Microphones. IEEE Trans. on SAP 11(6) (2003)

    Google Scholar 

  13. Macho, D., et al.: Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the CHIL Seminar Corpus. In: Proceedings of ICME (2005)

    Google Scholar 

  14. Buchner, H., et al.: Simultaneous Localization of Multiple Sound Sources using Blind Adaptive MIMO Filtering. In: Proc. of ICASSP (2005)

    Google Scholar 

  15. Alvarado, V.: Talker Localization and Optimal Placement of Microphones for a Linear Microphone Array using Stochastic Region Contraction, PhD Thesis, Technical Report LEMS-69, Brown University (1990)

    Google Scholar 

  16. Focken, D., Stiefelhagen, R.: Towards Vision-based 3-d People Tracking in a Smart Room. In: IEEE Int. Conf. Multimodal Interfaces (2002)

    Google Scholar 

  17. Champagne, B., Bedard, S., Stephenne, A.: Performance of Time Delay Estimation in the Presence of Room Reverberation. IEEE Trans. on SAP 4 (1996)

    Google Scholar 

  18. Nishiura, T., Yamada, T., Nakamura, S., Shikano, K.: Localization of Multiple Sound Source based on a CSP analysis with a Microphone Array. In: ICASSP 2000 (2000)

    Google Scholar 

  19. Brutti, A., Omologo, M., Svaizer, P.: Oriented Global Coherence Field for the Estimation of the Head Orientation in Smart Rooms equipped with Distributed Microphone Arrays. In: Proc. of Interspeech (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Omologo, M., Svaizer, P., Brutti, A., Cristoforetti, L. (2006). Speaker Localization in CHIL Lectures: Evaluation Criteria and Results. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_40

Download citation

  • DOI: https://doi.org/10.1007/11677482_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32549-9

  • Online ISBN: 978-3-540-32550-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics