A Robust Statistical-Based Speaker's Location Detection Algorithm in a Vehicular Environment

Open Access
Research Article
Part of the following topical collections:
  1. Spatial Sound and Virtual Acoustics

Abstract

This work presents a robust speaker's location detection algorithm using a single linear microphone array that is capable of detecting multiple speech sources under the assumption that there exist nonoverlapped speech segments among sources. Namely, the overlapped speech segments are treated as uncertainty and are not used for detection. The location detection algorithm is derived from a previous work (2006), where Gaussian mixture models (GMMs) are used to model location-dependent and content and speaker-independent phase difference distributions. The proposed algorithm is proven to be robust against the complex vehicular acoustics including noise, reverberation, near-filed, far-field, line-of-sight, and non-line-of-sight conditions, and microphones' mismatch. An adaptive system architecture is developed to adjust the Gaussian mixture (GM) location model to environmental noises. To deal with unmodeled speech sources as well as overlapped speech signals, a threshold adaptation scheme is proposed in this work. Experimental results demonstrate high detection accuracy in a noisy vehicular environment.

Keywords

Mixture Model Acoustics Speech Signal Adaptation Scheme Gaussian Mixture Model 

References

  1. 1.
    Ryan JG, Goubran RA: Application of near-field optimum microphone arrays to hands-free mobile telephony. IEEE Transactions on Vehicular Technology 2003,52(2):390-400.Google Scholar
  2. 2.
    Pulasinghe K, Watanabe K, Izumi K, Kiguchi K: Modular fuzzy-neuro controller driven by spoken language commands. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(1):293-302. 10.1109/TSMCB.2003.811511CrossRefGoogle Scholar
  3. 3.
    Herbordt W, Horiuchi T, Fujimoto M, Jitsuhiro T, Nakamura S: Noise-robust hands-free speech recognition on PDAs using microphone array technology. Autumn Meeting of the Acoustical Society of Japan, September 2005, Sendai, Japan 51-54.Google Scholar
  4. 4.
    Gannot S, Burshtein D, Weinstein E: Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Transactions on Signal Processing 2001,49(8):1614-1626. 10.1109/78.934132CrossRefGoogle Scholar
  5. 5.
    Aarabi P, Shi G: Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2004,34(4):1763-1773. 10.1109/TSMCB.2004.830345CrossRefGoogle Scholar
  6. 6.
    Hu J-S, Cheng C-C: Frequency domain microphone array calibration and beamforming for automatic speech recognition. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2005,E88-A(9):2401-2411. 10.1093/ietfec/e88-a.9.2401CrossRefGoogle Scholar
  7. 7.
    Ahn S, Ko H: Background noise reduction via dual-channel scheme for speech recognition in vehicular environment. IEEE Transactions on Consumer Electronics 2005,51(1):22-27. 10.1109/TCE.2005.1405694CrossRefGoogle Scholar
  8. 8.
    Carter GC, Nuttall AH, Cable PG: The smoothed coherence transform. Proceedings of the IEEE 1973,61(10):1497-1498.CrossRefGoogle Scholar
  9. 9.
    Knapp CH, Carter GC: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 1976, 24: 320-327. 10.1109/TASSP.1976.1162830CrossRefGoogle Scholar
  10. 10.
    Bienvenu G: Eigensystem properties of the sampled space correlation matrix. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '83), 1983, Boston, Mass, USA 8: 332-335.CrossRefGoogle Scholar
  11. 11.
    Wax M, Shan T-J, Kailath T: Spatio-temporal spectral analysis by eigenstructure methods. IEEE Transactions on Acoustics, Speech, and Signal Processing 1984,32(4):817-827. 10.1109/TASSP.1984.1164400CrossRefGoogle Scholar
  12. 12.
    Wang H, Kaveh M: Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Transactions on Acoustics, Speech, and Signal Processing 1985,33(4):823-831. 10.1109/TASSP.1985.1164667CrossRefGoogle Scholar
  13. 13.
    Smith JO, Abel JS: Closed-form least-squares source location estimation from range-difference measurements. IEEE Transactions on Acoustics, Speech, and Signal Processing 1987,35(12):1661-1669. 10.1109/TASSP.1987.1165089CrossRefGoogle Scholar
  14. 14.
    Hu J-S, Cheng C-C, Liu W-H, Su TM: A speaker tracking system with distance estimation using microphone array. Proceedings of the IEEE/ASME International Conference on Advanced Manufacturing Technologies and Education, August 2002, Chiayi, Taiwan 485-494.Google Scholar
  15. 15.
    Hu J-S, Su TM, Cheng C-C, Liu W-H, Wu TI: A self-calibrated speaker tracking system using both audio and video data. Proceedings of the IEEE Conference on Control Applications, September 2002, Glasgow, Scotland 2: 731-735.CrossRefGoogle Scholar
  16. 16.
    Omologo M, Svaizer P: Acoustic source location in noisy and reverberant environment using CSP analysis. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '96), May 1996, Atlanta, Ga, USA 901-904.Google Scholar
  17. 17.
    Brandstein MS, Silverman HF: A robust method for speech signal time-delay estimation in reverberant rooms. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 375-378.Google Scholar
  18. 18.
    Strobel N, Rabenstein R: Classification of time delay estimates for robust speaker localization. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 6: 3081-3084.CrossRefGoogle Scholar
  19. 19.
    Mavandadi S, Aarabi P: Multichannel nonlinear phase analysis for time-frequency data fusion. Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2003, April 2003, Orlando, Fla, USA, Proceedings of SPIE 5099: 222-231.Google Scholar
  20. 20.
    Aarabi P, Mavandadi S: Robust sound localization using conditional time-frequency histograms. Information Fusion 2003,4(2):111-122. 10.1016/S1566-2535(03)00003-4CrossRefGoogle Scholar
  21. 21.
    Ward DB, Williamson RC: Particle filter beamforming for acoustic source localization in a reverberant environment. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 2: 1777-1780.Google Scholar
  22. 22.
    Potamitis I, Chen H, Tremoulis G: Tracking of multiple moving speakers with multiple microphone arrays. IEEE Transactions on Speech and Audio Processing 2004,12(5):520-529. 10.1109/TSA.2004.833004CrossRefGoogle Scholar
  23. 23.
    Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing 2003,2003(4):359-370. 10.1155/S1110865703212038CrossRefMATHGoogle Scholar
  24. 24.
    Chung P-J, Böhme JF, Hero AO: Tracking of multiple moving sources using recursive EM algorithm. EURASIP Journal on Applied Signal Processing 2005,2005(1):50-60. 10.1155/ASP.2005.50CrossRefMATHGoogle Scholar
  25. 25.
    Ng BC, See CMS: Sensor-array calibration using a maximum-likelihood approach. IEEE Transactions on Antennas and Propagation 1996,44(6):827-835. 10.1109/8.509886CrossRefGoogle Scholar
  26. 26.
    Ward DB, Lehmann EA, Williamson RC: Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Transactions on Speech and Audio Processing 2003,11(6):826-836. 10.1109/TSA.2003.818112CrossRefGoogle Scholar
  27. 27.
    Hu J-S, Cheng C-C, Liu W-H: Robust speaker's location detection in a vehicle environment using GMM models. IEEE Transactions on Systems, Man, and Cybernetics, Part B 2006,36(2):403-412.CrossRefGoogle Scholar
  28. 28.
    Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72-83. 10.1109/89.365379CrossRefGoogle Scholar
  29. 29.
    Ramírez J, Segura JC, Benítez C, De la Torre A, Rubio Á: Efficient voice activity detection algorithms using long-term speech information. Speech Communication 2004,42(3-4):271-287. 10.1016/j.specom.2003.10.002CrossRefGoogle Scholar
  30. 30.
    Potamitis I: Estimation of speech presence probability in the field of microphone array. IEEE Signal Processing Letters 2004,11(12):956-959. 10.1109/LSP.2004.838200CrossRefGoogle Scholar
  31. 31.
    Brandstein M, Ward D: Microphone Arrays: Signal Processing Techniques and Applications. Springer, New York, NY, USA; 2001. chapter 2CrossRefGoogle Scholar
  32. 32.
    Reynolds DA, Rose RC: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 1995,3(1):72-83. 10.1109/89.365379CrossRefGoogle Scholar
  33. 33.
    Xuan G, Zhang W, Chai P: EM algorithms of Gaussian mixture model and hidden Markov model. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001, Thessaloniki, Greece 1: 145-148.Google Scholar
  34. 34.
    Mitsubishi Motors - Savrin (http://www.sym-motor.com.tw/savrin-1.htm)
  35. 35.
    Ryan JG, Goubran RA: Near-field beamforming for microphone arrays. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '97), April 1997, Munich, Germany 1: 363-366.Google Scholar
  36. 36.
    Vries DD, Hulsebos EM, Bann J: Spatial fluctuations in measures for spaciousness. Journal of the Acoustical Society of America 2001,110(2):947-954. 10.1121/1.1377634CrossRefGoogle Scholar
  37. 37.
    Pelorson X, Vian J-P, Polack J-D: On the variability of room acoustical parameters: reproducibility and statistical validity. Applied Acoustics 1992,37(3):175-198. 10.1016/0003-682X(92)90002-ACrossRefGoogle Scholar

Copyright information

© Jwu-Sheng Hu et al. 2007

Authors and Affiliations

  1. 1.Department of Electrical and Control EngineeringNational Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations