Abstract
The objective of this work is to explore the significance of efficient glottal activity detection for inter-emotion conversion. Performance of popular glottal epoch detection algorithms like Dynamic Projected Phase-Slope Algorithm (DYPSA), Speech Event Detection using Residual Excitation And a Mean-based Signal (SEDREAMS) and Zero Frequency Filtering (ZFF) are compared in the context of vocal emotion conversion. Existing conversion approaches deal with synthesis/conversion from neutral to different emotions. In this work, we have demonstrated the efficacy of determining the conversion parameters based on statistical values derived from multiple emotions and using them for inter-emotion conversion in Indian context. Pitch modification is effected by using transformation scales derived from both male and female speakers in IIT Kharagpur-Simulated Emotion Speech Corpus. Three archetypal emotions viz. anger, fear and happiness were generated using pitch and amplitude modification algorithm. Analysis of statistical parameters for pitch after conversion revealed that anger gives good subjective and objective similarity while characteristics of fear and happiness are most challenging to synthesise. Also, use of male voice for synthesis gave better intelligibility. Glottal activity detection by ZFF gave results with least error for median pitch. The results from this study indicated that for emotions with overlapping characteristics like surprise and happiness, inter-emotion conversion can be a better choice than conversion from neutral.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1960)
Murthy, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Proc. 16(8), 1602–1614 (2008)
Rao, K.S., Yegnanarayana, B.: Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Proc. 14, 972–980 (2006)
Ananthapadmanabha, T.V., Yegnanarayana, B.: Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Proc. ASSP-27, 309–319 (1979)
Prasanna, S.R.M., Govind, D.: Analysis of excitation source information in emotional speech. In: Proceedings of the Interspeech, Makuhari, Chiba, pp. 781–784 (2010)
Naylor, P., Kounoudes, A., Gudnason, J., Brookes, M.: Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech Lang. Proc. 15(1), 34–43 (2007)
Drugman, T., Dutoit, T.: Glottal closure and opening instant detection from speech signals. In: Proceedings of Interspeech, 6–10 September 2009, pp. 2891–2894 (2009)
Yegnanarayana, B., Murthy, K.S.R.: Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Proc. 17(4), 614–624 (2009)
Murthy, K.S.R., Yegnanarayana, B., Joseph, A.: Characterisation of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)
Sturmel. N., d’Alessandro, C., Rigaud, F.: Glottal closure instant detection using lines of maximum amplitudes (LOMA) of the wavelet transform. In: Proceedings of ICASSP, pp. 4517–4520 (2009)
Cabral, J.P., Kane, J., Gobl, C., Carson-Berndsen, J.: Evaluation of glottal epoch detection algorithms on different voice types. In: Proceedings of the Interspeech, Florence, Italy, pp. 1989–1992 (2011)
Ramesh, K., Prasanna, S.R.M., Govind, D.: Detection of glottal opening instants using hilbert envelope. In: Proceedings of the Interspeech, Lyon, France, pp. 44–48 (2013)
Smits, R., Yegnanarayana, B.: Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Audio Speech Lang. Proc. 3, 325–333 (1995)
Koolagudi, S.G., Maity, S., Vuppala, A.K., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Proceedings of IC3, Noida, pp. 485–492 (2009)
Akanksh, B., Vekkot, S., Tripathi, S.: Inter-conversion of emotions in speech using TDPSOLA. In: Proceedings of the Advances in Signal Processing and Intelligent Recognition Systems, SIRS 2015, pp. 367–378. Springer, Trivandrum (2015)
Govind, D., Joy, T.T.: Improving the flexibility of dynamic prosody modification using instants of significant excitation. Circ. Syst. Sig. Process. 35(7), 2518–2543 (2016). doi:10.1007/s00034-015-0159-5
Govind, D., Prasanna, S.R.M.: Dynamic prosody modification using zero frequency filtered signal. Int. J. Speech Technol. 16(1), 41–54 (2013). doi:10.1007/s10772-012-9155-3
Acknowledgements
We would like to thank Dr. Govind. D, Amrita Vishwa Vidyapeetham, Coimbatore for providing us with IIT-KGP SESC database used in this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Vekkot, S., Tripathi, S. (2018). Significance of Glottal Closure Instants Detection Algorithms in Vocal Emotion Conversion. In: Balas, V., Jain, L., Balas, M. (eds) Soft Computing Applications. SOFA 2016. Advances in Intelligent Systems and Computing, vol 633. Springer, Cham. https://doi.org/10.1007/978-3-319-62521-8_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-62521-8_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62520-1
Online ISBN: 978-3-319-62521-8
eBook Packages: EngineeringEngineering (R0)