Advertisement

Glottal Activity Detection from the Speech Signal Using Multifractal Analysis

  • G. Jyothish Lal
  • E. A. GopalakrishnanEmail author
  • D. Govind
Article
  • 3 Downloads

Abstract

This work proposes a novel method for the detection of glottal activity regions from the speech signal. Glottal activity detection refers to the problem of discriminating voiced and unvoiced segments of the speech signal. This is a fundamental step in the work flow of many speech processing applications. Much of the existing approaches for voiced/unvoiced detection are based on linear measures though the speech is produced from an underlying nonlinear process. The present work solves the problem from a nonlinear perspective, using the framework of multifractal analysis. The fractal property of the speech signal during the production of voiced and unvoiced sounds is sought to obtain the characterization of glottal activity. The characterization is done by computing the Hurst exponent from the evaluation of the scaling property of fluctuations present in the speech signal. Experimental analysis shows that Hurst exponent varies consistently with respect to the dynamics of glottal activity. The performance of the proposed method has been evaluated on the CMU-arctic, Keele and KED-Timit databases with simultaneous electroglottogram signals. Experimental results show that the average detection accuracy or error rate of the proposed method is comparable to the best performing algorithm on clean speech signals. Besides, evaluation of the robustness of the proposed method to noise degradation shows comparable results with other methods for signal-to-noise ratio greater than 10 dB and 20 dB, respectively, for white noise and babble noise.

Keywords

Glottal activity detection Voiced/unvoiced detection Multifractal analysis Hurst exponent Speech signal Nonlinear approach 

Notes

Acknowledgements

The authors gratefully acknowledge Amrita Vishwa Vidyapeetham for the generous funding provided to the first author in pursuing his Ph.D. Further, we thank Dr. Vineeth Nair (IIT Bombay) for providing a better understanding of MFDFA through his Ph.D. thesis.

Supplementary material

References

  1. 1.
    O.A. Adeyemi, Multifractal Analysis of Unvoiced Speech Signals, Ph.D. dissertation, University of Rhode Island, USA, 1997Google Scholar
  2. 2.
    N. Adiga, S.R.M. Prasanna, Detection of glottal activity using different attributes of source information. IEEE Signal Process. Lett. 22(11), 2107–2111 (2015)CrossRefGoogle Scholar
  3. 3.
    N. Adiga, B.K. Khonglah, S.R.M. Prasanna, Improved voicing decision using glottal activity features for statistical parametric speech synthesis. Digit. Signal Process. 71, 131–143 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    G. Aneeja, B. Yegnanarayana, Single frequency filtering approach for discriminating speech and nonspeech. IEEE Trans. Audio Speech Lang. Process. 23(4), 705–717 (2015)CrossRefGoogle Scholar
  5. 5.
    D. Arifianto, Dual parameters for voiced-unvoiced speech signal determination, in Proceedings of ICASSP, vol. 4 (2007), pp. 749–752Google Scholar
  6. 6.
    B.S. Atal, L.R. Rabiner, A pattern recognition approach to voiced–unvoiced–silence classification with applications to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 24(3), 201–212 (1976)CrossRefGoogle Scholar
  7. 7.
    A. Benyassine, E. Shlomot, H. Su, D. Massaloux, C. Lamblin, J. Petit, ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. IEEE Commun. Mag. 35(9), 64–73 (1997)CrossRefGoogle Scholar
  8. 8.
    S. Bhaduri, D. Ghosh, Speech, music and multifractality. Curr. Sci. 110(9), 1817–1822 (2016)CrossRefGoogle Scholar
  9. 9.
    S. Bhaduri, A. Chakraborty, D. Ghosh, Speech emotion quantification with chaos-based modified visibility graph-possible precursor of suicidal tendency. J. Neurol. Neurosci. 7(3), 1–7 (2016)CrossRefGoogle Scholar
  10. 10.
    W.A. Black, Ked timit database (2002). http://festvox.org/dbs/dbs_kdt.html. Accessed 14 Oct 2018
  11. 11.
    W.A. Black, T. Paul, C. Richard, The Festival speech synthesis system (2014). http://www.cstr.ed.ac.uk/projects/festival/. Accessed 06 June 2019
  12. 12.
    N. Dhananjaya, B. Yegnanarayana, Voiced/nonvoiced detection based on robustness of voiced epochs. IEEE Signal Process. Lett. 17(3), 273–276 (2010)CrossRefGoogle Scholar
  13. 13.
    T. Drugman, A. Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of Interspeech (2011), pp. 1973–1976Google Scholar
  14. 14.
    T. Drugman, P. Alku, A. Alwan, B. Yegnanarayana, Glottal source processing: from analysis to applications. Comput. Speech Lang. 28(5), 1117–1138 (2014)CrossRefGoogle Scholar
  15. 15.
    D. Enqing, L. Guizhong, Z. Yatong, Z. Xiaodi, Applying support vector machines to voice activity detection, in Proceedings of International Conference Signal Processing (2002), pp. 1124–1127Google Scholar
  16. 16.
    D.C. Gonzalez, L.L. Ling, F. Violaro, Analysis of the multifractal nature of speech signals, in CIARP in: LNCS, vol. 7441, ed. by L. Alvarez et al. (Springer, Berlin, 2012)Google Scholar
  17. 17.
    D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Significance of glottal activity detection for duration modification, in Proceedings of Speech Prosody (2012), pp. 470–473Google Scholar
  18. 18.
    N. Henrich, C. d’Alessandro, B. Doval, M. Castellengo, On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. J. Acoust. Soc. Am. 115(3), 1321–1332 (2004)CrossRefGoogle Scholar
  19. 19.
    R.S. Holambe, M.S. Deshpande, chap 2, in Nonlinearity Framework in Speech Processing (Springer, Boston, 2012), pp. 11–25Google Scholar
  20. 20.
    H.E. Hurst, Long-term storage capacity of reservoirs. Trans. Am. Soc. Civ. Eng. 166, 770–799 (1951)Google Scholar
  21. 21.
    E.A.F. Ihlen, Introduction to multi-fractal detrended fluctuation analysis in Matlab. Front. Physiol. 3(141), 1–18 (2012)Google Scholar
  22. 22.
    K. Itoh, M. Mizushima, Environmental noise reduction based on speech/non-speech identification for hearing aids, in Proceedings of ICASSP, vol. 1 (1997), pp. 419–422Google Scholar
  23. 23.
    L. Janer, J.J. Bonet, E. Lleida-Solano, Pitch detection and voiced/unvoiced decision algorithm based on wavelet transforms, in Proceedings of IEEE International Conference on Spoken Language Processing (1996), pp 1209–1212Google Scholar
  24. 24.
    J.W. Kantelhardt, S.A. Zschiegner, E.K. Bunde, S. Havlin, A. Bunde, H.E. Stanley, Multifractal detrended fluctuation analysis of non-stationary time series. Phys. A 316, 87–114 (2002)CrossRefzbMATHGoogle Scholar
  25. 25.
    J. Kominek, A. Black, CMU-arctic speech databases, in Proceedings of ISCA Speech Synthesis Workshop (2004), pp. 223–224Google Scholar
  26. 26.
    A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)CrossRefGoogle Scholar
  27. 27.
    G.J. Lal, E.A. Gopalakrishnan, D. Govind, Accurate estimation of glottal closure instants and glottal opening instants from electroglottographic signal using variational mode decomposition. Circuits Syst. Signal Process. 37(2), 810–830 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    G.J. Lal, E.A. Gopalakrishnan, D. Govind, Epoch estimation from emotional speech signals using variational mode decomposition. Circuits Syst. Signal Process. 37(8), 3245–3274 (2018)MathSciNetCrossRefGoogle Scholar
  29. 29.
    H. Liu, W. Zhang, Mandarin emotion recognition based on multifractal theory towards human–robot interaction, in Proceedings of International Conference on Robotics and Biomimetics (2013), pp. 593–598Google Scholar
  30. 30.
    B.B. Manelbort, A multifractal walk down Wall Street. Sci. Am. 298, 70–73 (1999)Google Scholar
  31. 31.
    K.S.R. Murty, B. Yegnanarayana, M. Anand Joseph, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)CrossRefGoogle Scholar
  32. 32.
    V. Nair, Role of intermittency in the onset of combustion instability, Ph.D. thesis, Indian Institute of Technology Madras, India, 2014Google Scholar
  33. 33.
    V. Nair, R.I. Sujith, Multifractality in combustion noise: predicting an impending combustion instability. J. Fluid Mech. 747, 635–655 (2014)CrossRefGoogle Scholar
  34. 34.
    T. Ng, B. Zhang, L. Nguyen, S. Matsoukas, X. Zhou, N. Mesgarani, K. Vesely, P. Matejka, Developing a speech activity detection system for the DARPA RATS program, in Proceedings of Interspeech (2012), pp. 1–4Google Scholar
  35. 35.
    A. Pandey, R.K. Das, N. Adiga, N. Gupta, S.R.M. Prasanna, Significance of glottal activity detection for speaker verification in degraded and limited data condition, in Proceedings of TENCON (2015), pp. 1–6Google Scholar
  36. 36.
    C.K. Peng, S. Havlin, H.E. Stanley, A.L. Goldberger, Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos Interdiscip. J. Nonlinear Sci. 5, 82–87 (1995)CrossRefGoogle Scholar
  37. 37.
    F. Plante, G.F. Meyer, W.A. Aubsworth, A pitch extraction reference database, in Proceedings of Eurospeech (1995), pp. 827–840Google Scholar
  38. 38.
    A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)CrossRefGoogle Scholar
  39. 39.
    F. Qi, C. Bao, Y. Liu, A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech, in Proceedings of International Symposium on Chinese Spoken Language Processing (2004), pp. 77–80Google Scholar
  40. 40.
    T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice-Hall, Upper Saddle River, 2002)Google Scholar
  41. 41.
    T. Schreiber, A. Schmitz, Surrogate time series. Phys. D 142, 346–382 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    J.K. Shah, A.N. Iyer, B.Y. Smolenski, R.E. Yantorno, Robust voiced/unvoiced classification using novel features and Gaussian mixture model, in Proceedings of ICASSP (2004), pp. 1–4Google Scholar
  43. 43.
    C. Shahnaz, W. Zhu, M.O. Ahmad, A multifeature voiced/unvoiced decision algorithm for noisy speech, in Proceedings of IEEE International Symposium on Circuits and Systems (2006), pp. 2525–2528Google Scholar
  44. 44.
    J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)CrossRefGoogle Scholar
  45. 45.
    S.H. Strogatz, Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry and Engineering (Westview Press, Boulder, 2000)zbMATHGoogle Scholar
  46. 46.
    F. Takens, Detecting strange attractors in turbulence, in Lectures Notes in Mathematics, vol. 898 (1981), pp. 366–381Google Scholar
  47. 47.
    D. Talkin, A robust algorithm for pitch tracking (RAPT). Speech Coding Synth. 495, 495–518 (1995)Google Scholar
  48. 48.
    J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J. Farmer, Testing for nonlinearity in time series: the method of surrogate data. Phys. D 58, 77–94 (1992)CrossRefzbMATHGoogle Scholar
  49. 49.
    M.R.P. Thomas, P.A. Naylor, The sigma algorithm: a glottal activity detector for electroglottographic signals. IEEE Trans. Audio Speech Lang. Process. 17, 1557–1566 (2009)CrossRefGoogle Scholar
  50. 50.
    D. Valj, B. Kotnik, B. Horvat, Z. Kacic, A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems. EURASIP J. Adv. Signal Process. 4, 487–497 (2005)zbMATHGoogle Scholar
  51. 51.
    A. Varga, H.J. Steeneken, Assessment for automatic speech recognition. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)CrossRefGoogle Scholar
  52. 52.
    Z. Zhang, Mechanics of human voice production and control. J. Acoust. Soc. Am. 140(4), 2614–2635 (2016)CrossRefGoogle Scholar
  53. 53.
    X.L. Zhang, J. Wu, Deep belief networks based voice activity detection. IEEE Trans. Audio Speech Lang. Process. 21(4), 697–710 (2013)CrossRefGoogle Scholar
  54. 54.
    H. Zhao, S. He, Analysis of speech signals characteristics based on MF-DFA with moving overlapping windows. Phys. A 442, 343–349 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • G. Jyothish Lal
    • 1
  • E. A. Gopalakrishnan
    • 1
    Email author
  • D. Govind
    • 1
  1. 1.Center for Computational Engineering and Networking (CEN), Amrita School of EngineeringAmrita Vishwa VidyapeethamCoimbatoreIndia

Personalised recommendations