A Speaker Localization System for Lecture Room Environment

Parviainen, Mikko; Pirinen, Tuomo; Pertilä, Pasi

doi:10.1007/11965152_20

A Speaker Localization System for Lecture Room Environment

Mikko Parviainen¹⁹,
Tuomo Pirinen¹⁹ &
Pasi Pertilä¹⁹

Conference paper

760 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Abstract

This paper presents a speaker localization system, which is an entry to Rich Transcription 2005 Spring Meeting Recognition Evaluation. The system is developed in the Institute of Signal Processing at Tampere University of Technology (TUT). The paper describes the framework of the evaluation and the proposed localization system. This paper is an extension to [1] giving the actual performance values of the system.

The localization system is based on spatially separate sensor stations. The sensor stations estimate Direction of Arrival (DOA) of acoustic wavefronts. Each sensor station produces a three dimensional DOA vector. The estimated DOA vectors at each time instant are combined to calculate the location of the sound sound source.

The performance of the system was determined using a set of predefined metrics. Using multiple metrics enables one to evaluate the performance of the localization system from different viewpoints. The overall performance is characterized by RMS error between estimates and reference positions. The results show that the performance of the proposed system is consistent and accuracy is satisfactory for meeting room scenario. However, several improvements can be seen.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pirinen, T., Pertilä, P., Parviainen, M.: The TUT 2005 source localization system. In: Rich Transcription 2005 Sring Meeting Recognition Evaluation, July 13, 2005. Royal College of Physicians, Edinburgh (2005)
Google Scholar
Nakatani, T., Okuno, H.G.: Harmonic sound stream segregation using localization and its application to speech stream segregation. Speech Communication 27, 209–222 (1999)
Article Google Scholar
Bregman, A.S.: Auditory Scene Analysis. The MIT Press, Cambridge (1990)
Google Scholar
National Institute of Standards and Technology: Spring 2005 (RT-05S) Rich Transcription Meeting Recognition Evaluation Plan. (2005), http://www.nist.gov/speech/tests/rt/rt2005/spring/rt05s-meeting-eval-plan-V1.pdf
Omologo, M., Brutti, A., Svaizer, P.: Speaker localization and tracking – evaluation criteria. Technical report, National Institute of Standards and Technology (2005), http://www.nist.gov/speech/tests/rt/rt2005/spring/sloc/CHILIRST_SpeakerLocEval-V5.0-2005-01-18.pdf1
Surcin, S., Stiefelhagen, R., McDonough, J.: D7.4 evaluation packages for the first CHIL evaluation campaign. Technical report, Computers in the Human Interaction Loop (CHIL) Consortium (2005)
Google Scholar
Stiefelhagen, R.: CHIL evaluation data – overview of sensor setup and recordings. Technical report, Computers in the Human Interaction Loop (CHIL) Consortium (2004)
Google Scholar
Knapp, C., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 24(4), 320–327 (1976)
Article Google Scholar
Yli-Hietanen, J., Kalliojärvi, K., Astola, J.: Low-complexity angle of arrival estimation of wideband signals using small arrays. In: Proceedings of the 8th IEEE Signal Processing Workshop on Statistical Signal and Array Signal Processing, pp. 109–112 (1996)
Google Scholar
Haykin, S., Justice, J.: Array signal processing. Academic Press, London (1985)
MATH Google Scholar
Pirinen, T.: Normalized confidence factors for robust direction of arrival estimation. In: Proceedings of the 2005 IEEE International Symposium on Circuits and Systems (ISCAS) (2004)
Google Scholar
Kaplan, L., Le, Q., Molnár, P.: Maximum likelihood methods for bearings-only target localization. In: Proceedings of the 2001 IEEE International Conference on Acoustics Speech, and Signal Processing (ICASSP 2001), pp. 3001–3004 (2001)
Google Scholar
Hawkes, M., Nehorai, A.: Wideband source localization using a distributed acoustic vector-sensor array. IEEE Transactions on Signal Processing 51(6), 1479–1491 (2003)
Article MathSciNet Google Scholar
Pertilä, P., Parviainen, M., Korhonen, T., Visa, A.: A spatiotemporal approach to passive sound source localization. In: International Symposium on Communications and Information Technologies (ISCIT 2004) (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Signal Processing, Tampere University of Technology, Tampere, P.O. Box 553, FIN-33101, Finland
Mikko Parviainen, Tuomo Pirinen & Pasi Pertilä

Authors

Mikko Parviainen
View author publications
You can also search for this author in PubMed Google Scholar
Tuomo Pirinen
View author publications
You can also search for this author in PubMed Google Scholar
Pasi Pertilä
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Parviainen, M., Pirinen, T., Pertilä, P. (2006). A Speaker Localization System for Lecture Room Environment. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_20

Download citation

DOI: https://doi.org/10.1007/11965152_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics