Advertisement

Analysis of Monaural and Binaural Statistical Properties for the Estimation of Distance of a Target Speaker

  • 20 Accesses

Abstract

The paper presents an auditory distance perception model that is based on the extraction of statistical properties from monaural and binaural features in a reverberant room environment. The developed framework has considered both mono and stereo speech signals originated from different distances at various reverberation time periods. Hence, two models, namely single-channel monaural statistics and binaural-channel monaural statistics, have been discussed in this study. The distance-dependent statistical features from fused monaural coefficients, namely cepstral and envelope features, are chosen as an input to the different classification algorithms such as Gaussian mixture model-expectation maximization, support vector machine and random forest for the estimation of distance of a desired target user. The monaural coefficients are extracted in addition with the binaural cues, such as interaural time and level differences and interaural coherence (ITD, ILD and IC) for the binaural speech signals and eventually applied for the estimation of distance. The proposed monaural and binaural models observe an average of more than 5% better results compared to existing baseline techniques even at lower signal-to-noise ratio, 0 dB.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. 1.

    A.K.H. Al-Ali, D. Dean, B. Senadji, V. Chandran, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5(99), 1–1 (2017). https://doi.org/10.1109/access.2017.2728801

  2. 2.

    A. Alinaghi, W. Wang, P.J. Jackson, Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2013), pp 684–688

  3. 3.

    N. Almaadeed, M. Asim, S. Al-Maadeed, A. Bouridane, A. Beghdadi, Automatic detection and classification of audio events for road surveillance applications. Sensors 18(6), 1858 (2018)

  4. 4.

    C.C. Chang, C.J. Lin, LIBSVM: A Library for Support Vector Machines (2001). www.csie.ntu.edu.tw/cjlin/libsvm

  5. 5.

    J. Chen, Y. Wang, D.L. Wang, A feature study for classification-based speech separation at low signal-to-noise ratios. IEEE/ACM Trans. Audio Speech Language Process. 22(12), 1993–2002 (2014)

  6. 6.

    M. Cobos, J.J. Lopez, D. Martinez, Two-microphone multi-speaker localization based on a Laplacian mixture model. Digital Signal Process. 21(1), 66–76 (2011)

  7. 7.

    T.L.T. da Silveira, A.J. Kozakevicius, C.R. Rodrigues, Single-channel EEG sleep stage classification based on a streamlined set of statistical features in wavelet domain. Med. Biol. Eng. Comput. 55(2), 343–352 (2017)

  8. 8.

    D. Ellis, PLP and RASTA (and MFCC, and Inversion) in Matlab (2005). http://www.ee.columbia.edu/dpwe/resources/matlab/rastamat/

  9. 9.

    D.P. Ellis, X. Zeng, J.H. McDermott, Classifying soundtracks with audio texture features, in IEEE international conference on acoustics, speech and signal processing (ICASSP) (IEEE) (2011), pp. 5880–5883

  10. 10.

    J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus (Linguistic Data Consortium, Philadelphia, 1993)

  11. 11.

    E. Georganti, T. May, S. Van de Par, A. Harma, J. Mourjopoulos, Speaker distance detection using a single microphone. IEEE Trans. Audio Speech Language Process. 19(7), 1949–1961 (2011)

  12. 12.

    E. Georganti, T. May, S. Van de Par, J. Mourjopoulos, Sound source distance estimation in rooms based on statistical properties of binaural signals. IEEE Trans. Audio Speech Lang. Process. 21(8), 1727–1741 (2013)

  13. 13.

    Y. Hioka, K. Niwa, S. Sakauchi, K. Furuya, Y. Haneda, Estimating direct-to-reverberant energy ratio using D/R spatial correlation matrix model. IEEE Trans. Audio Speech Language Process. 19(8), 2374–2384 (2011)

  14. 14.

    Y. Hu, P. Loizou, Subjective evaluation and comparison of speech enhancement algorithms. Speech Commun. 49, 588–601 (2007)

  15. 15.

    S.L. Jayalakshmi, S. Chandrakala, R. Nedunchelian, Global statistical features-based approach for acoustic event detection. Appl. Acoust. 139, 113–118 (2018)

  16. 16.

    M. Jeub, M. Schäfer, P. Vary, A binaural room impulse response database for the evaluation of dereverberation algorithms, in Proceedings of International Conference on Digital Signal Processing (DSP) (2009), pp 1–4

  17. 17.

    Y. Jiang, D.L. Wang, R. Sheng Liu, Z. Feng, Binaural classification for reverberant speech segregation using deep neural networks. IEEE Trans. Audio Speech Lang. Process. 22(12), 2112–2121 (2014)

  18. 18.

    H.K. Kim, S.H. Choi, GMM-based matching ability measurement of a speech recognizer and a feature set, in Future Communication, Computing, Control and Management. Lecture Notes in Electrical Engineering, vol. 142 (Springer, Berlin, 2012)

  19. 19.

    A. Kohlrausch, J. Braasch, D. Kolossa, J. Blauert, The Technology of binaural Listening (Springer, Berlin, 2013)

  20. 20.

    S. Kuchibhotla, H.D. Vankayalapati, R.S. Vaddi, K.R. Anne, A comparative analysis of classifiers in emotion recognition through acoustic features. Int. J. Speech Technol. 17(4), 401–408 (2014)

  21. 21.

    H. Lim, M.J. Kim, H. Kim, Robust sound event classification using LBP-HOG based bag-of-audio-words feature representation, in Sixteenth Annual Conference of the International Speech Communication Association (2015)

  22. 22.

    Y.C. Lu, M. Cooke, Binaural estimation of sound source distance via the direct reverberant energy ratio for static and moving sources. IEEE Trans. Audio Speech Language Process. 18(7), 1793–1805 (2010)

  23. 23.

    Y.C. Lu, M. Cooke, Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Commun. 53(5), 622–642 (2011)

  24. 24.

    T. May, S. Van de Par, A. Kohlrausch, A binaural scene analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE Trans. Audio Speech Lang. Process. 20(7), 2016–2030 (2012)

  25. 25.

    G. Piñero, P.A. Naylor, Channel estimation for crosstalk cancellation in wireless acoustic networks, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017), pp. 586–590

  26. 26.

    S.O. Sadjadi, J.H.L. Hansen, Mean hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)

  27. 27.

    S.O. Sadjadi, J.H.L. Hansen, Blind spectral weighting for robust speaker identification under reverberation mismatch. IEEE/ACM Trans. Audio Speech Language Process. (TASLP) 22(5), 937–945 (2014)

  28. 28.

    S.O. Sadjadi, T. Hasan, J.H.L. Hansen, Mean hilbert envelope coefficients (MHEC) for robust speaker recognition, in Thirteenth Annual Conference of the International Speech Communication Association (2012)

  29. 29.

    B. Şen, M. Peker, A. Çavuşoğlu, F.V. Çelebi, A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. J. Med. Syst. 38(3), 18 (2014)

  30. 30.

    N. Sengupta, Md. Sahidullah, G. Saha, Lung sound classification using cepstral-based statistical features. Comput. Biol. Med. 75, 118–129 (2016)

  31. 31.

    N. Sengupta, Md. Sahidullah, G. Saha (2015) Optimization of cepstral features for robust lung sound classification, in 2015 Annual IEEE India Conference (INDICON) IEEE (2015)

  32. 32.

    K. Sreenivasa Rao, S. Sarkar, Robust speaker recognition in noisy environments (Springer, Cham, 2014)

  33. 33.

    D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, M.D. Plumbley, Detection and classification of acoustic scenes and events. IEEE Trans. Multimed. 17(10), 1733–1746 (2015)

  34. 34.

    M.K. Uçar, M.R. Bozkurt, C. Bilgin, K. Polat, Automatic detection of respiratory arrests in OSA patients using PPG and machine learning techniques. Neural Comput. Appl. 28(10), 2931–2945 (2017)

  35. 35.

    M.K. Uçar, M.R. Bozkurt, C. Bilgin, K. Polat, Automatic sleep staging in obstructive sleep apnea patients using photoplethysmography, heart rate variability signal and machine learning techniques. Neural Comput. Appl. 29(8), 1–16 (2018)

  36. 36.

    R. Venkatesan, A. Balaji Ganesh, Unsupervised auditory saliency enabled binaural scene analyzer for speaker localization and recognition. Adv. Signal Process. Intell. Recognit. Syst. 674, 337–350 (2018)

  37. 37.

    R.Venkatesan, A. Balaji Ganesh, Deep recurrent neural networks based binaural speech segregation for the selection of closest target of interest. Multimed. Tools Appl. 67(3) (2017)

  38. 38.

    S. Vesa, Binaural source distance learning in rooms. IEEE Trans. Audio Speech Language Process. 17(8), 1498–1507 (2009)

  39. 39.

    Y. Wang, K. Han, D.L. Wang, Exploring monaural features for classification-based speech segregation. IEEE Trans. Audio Speech Lang. Process. 21(2), 270–279 (2013)

  40. 40.

    J. Woodruff, D. Wang, Binaural localization of multiple sources in reverberant and noisy environments. IEEE Trans. Audio Speech Language Process. 20(5), 1503–1512 (2012)

  41. 41.

    X. Yan, W. Kang, F. Deng, Q. Wu, Palm vein recognition based on multi-sampling and feature-level fusion. Neurocomputing 151, 798–807 (2015)

  42. 42.

    Y. Yu, W. Wang, P. Han, Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural network. J Audio Speech Music Proc. (2016). https://doi.org/10.1186/s13636-016-0085-x

  43. 43.

    X. Zhao, Y. Wang, D.L. Wang, Robust speaker identification in noisy and reverberant conditions. IEEE Trans. Audio Speech Lang. Process. 22(4), 836–845 (2014)

  44. 44.

    J. Zhou, Wu X-m, W-j Zeng, Automatic detection of sleep apnea based on EEG detrended fluctuation analysis and support vector machine. J. Clin. Monit. Comput. 29(6), 767–772 (2015)

Download references

Acknowledgements

The authors wish to thank Department of Science and Technology for awarding a project under Cognitive Science Initiative Programme (DST File No.: SR/CSI/09/2011) through which the work has been implemented. Also, authors are very much grateful to the anonymous reviewers for their valuable and constructive suggestions.

Author information

Correspondence to A. Balaji Ganesh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Venkatesan, R., Ganesh, A.B. Analysis of Monaural and Binaural Statistical Properties for the Estimation of Distance of a Target Speaker. Circuits Syst Signal Process (2020) doi:10.1007/s00034-019-01333-5

Download citation

Keywords

  • Monaural features
  • Room acoustics
  • Distance-dependent statistical properties
  • Hilbert envelope features
  • Binaural cues
  • Classification models