Skip to main content

Significance of Glottal Closure Instants Detection Algorithms in Vocal Emotion Conversion

  • Conference paper
  • First Online:
Soft Computing Applications (SOFA 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 633))

Included in the following conference series:

Abstract

The objective of this work is to explore the significance of efficient glottal activity detection for inter-emotion conversion. Performance of popular glottal epoch detection algorithms like Dynamic Projected Phase-Slope Algorithm (DYPSA), Speech Event Detection using Residual Excitation And a Mean-based Signal (SEDREAMS) and Zero Frequency Filtering (ZFF) are compared in the context of vocal emotion conversion. Existing conversion approaches deal with synthesis/conversion from neutral to different emotions. In this work, we have demonstrated the efficacy of determining the conversion parameters based on statistical values derived from multiple emotions and using them for inter-emotion conversion in Indian context. Pitch modification is effected by using transformation scales derived from both male and female speakers in IIT Kharagpur-Simulated Emotion Speech Corpus. Three archetypal emotions viz. anger, fear and happiness were generated using pitch and amplitude modification algorithm. Analysis of statistical parameters for pitch after conversion revealed that anger gives good subjective and objective similarity while characteristics of fear and happiness are most challenging to synthesise. Also, use of male voice for synthesis gave better intelligibility. Glottal activity detection by ZFF gave results with least error for median pitch. The results from this study indicated that for emotions with overlapping characteristics like surprise and happiness, inter-emotion conversion can be a better choice than conversion from neutral.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1960)

    Google Scholar 

  2. Murthy, K.S.R., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Proc. 16(8), 1602–1614 (2008)

    Article  Google Scholar 

  3. Rao, K.S., Yegnanarayana, B.: Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Proc. 14, 972–980 (2006)

    Article  Google Scholar 

  4. Ananthapadmanabha, T.V., Yegnanarayana, B.: Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Proc. ASSP-27, 309–319 (1979)

    Article  Google Scholar 

  5. Prasanna, S.R.M., Govind, D.: Analysis of excitation source information in emotional speech. In: Proceedings of the Interspeech, Makuhari, Chiba, pp. 781–784 (2010)

    Google Scholar 

  6. Naylor, P., Kounoudes, A., Gudnason, J., Brookes, M.: Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech Lang. Proc. 15(1), 34–43 (2007)

    Article  Google Scholar 

  7. Drugman, T., Dutoit, T.: Glottal closure and opening instant detection from speech signals. In: Proceedings of Interspeech, 6–10 September 2009, pp. 2891–2894 (2009)

    Google Scholar 

  8. Yegnanarayana, B., Murthy, K.S.R.: Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Proc. 17(4), 614–624 (2009)

    Article  Google Scholar 

  9. Murthy, K.S.R., Yegnanarayana, B., Joseph, A.: Characterisation of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)

    Article  Google Scholar 

  10. Sturmel. N., d’Alessandro, C., Rigaud, F.: Glottal closure instant detection using lines of maximum amplitudes (LOMA) of the wavelet transform. In: Proceedings of ICASSP, pp. 4517–4520 (2009)

    Google Scholar 

  11. Cabral, J.P., Kane, J., Gobl, C., Carson-Berndsen, J.: Evaluation of glottal epoch detection algorithms on different voice types. In: Proceedings of the Interspeech, Florence, Italy, pp. 1989–1992 (2011)

    Google Scholar 

  12. Ramesh, K., Prasanna, S.R.M., Govind, D.: Detection of glottal opening instants using hilbert envelope. In: Proceedings of the Interspeech, Lyon, France, pp. 44–48 (2013)

    Google Scholar 

  13. Smits, R., Yegnanarayana, B.: Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Audio Speech Lang. Proc. 3, 325–333 (1995)

    Article  Google Scholar 

  14. Koolagudi, S.G., Maity, S., Vuppala, A.K., Chakrabarti, S., Rao, K.S.: IITKGP-SESC: speech database for emotion analysis. In: Proceedings of IC3, Noida, pp. 485–492 (2009)

    Google Scholar 

  15. Akanksh, B., Vekkot, S., Tripathi, S.: Inter-conversion of emotions in speech using TDPSOLA. In: Proceedings of the Advances in Signal Processing and Intelligent Recognition Systems, SIRS 2015, pp. 367–378. Springer, Trivandrum (2015)

    Google Scholar 

  16. Govind, D., Joy, T.T.: Improving the flexibility of dynamic prosody modification using instants of significant excitation. Circ. Syst. Sig. Process. 35(7), 2518–2543 (2016). doi:10.1007/s00034-015-0159-5

    Article  Google Scholar 

  17. Govind, D., Prasanna, S.R.M.: Dynamic prosody modification using zero frequency filtered signal. Int. J. Speech Technol. 16(1), 41–54 (2013). doi:10.1007/s10772-012-9155-3

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Dr. Govind. D, Amrita Vishwa Vidyapeetham, Coimbatore for providing us with IIT-KGP SESC database used in this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shikha Tripathi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Vekkot, S., Tripathi, S. (2018). Significance of Glottal Closure Instants Detection Algorithms in Vocal Emotion Conversion. In: Balas, V., Jain, L., Balas, M. (eds) Soft Computing Applications. SOFA 2016. Advances in Intelligent Systems and Computing, vol 633. Springer, Cham. https://doi.org/10.1007/978-3-319-62521-8_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62521-8_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62520-1

  • Online ISBN: 978-3-319-62521-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics