Skip to main content

Emotion Recognition Using Excitation Source Information

  • Chapter
  • First Online:
  • 1462 Accesses

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

This chapter provides the details of various excitation source features used for recognizing the emotions. The motivation to explore the excitation source information for emotion recognition is illustrated by demonstrating the speech files with source information alone. Details of extraction of proposed excitation source features ((i) Sequence of LP residual samples, (ii) LP residual phase, (iii) Epoch parameters and (iv) Glottal pulse parameters) are given. Two emotional speech databases are introduced to validate the proposed excitation source features. Functionality of classification models such as auto-associative neural networks and support vector machines is briefly explained. Finally, recognition performance using the proposed excitation source features is analyzed in detail.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. D. Ververidis and C. Kotropoulos, “A state of the art review on emotional speech databases,” in Eleventh Australasian International Conference on Speech Science and Technology, (Auckland, New Zealand), Dec. 2006.

    Google Scholar 

  2. S. G. Koolagudi, N. Kumar, and K. S. Rao, “Speech emotion recognition using segmental level prosodic analysis,” in International Conference on Devices and Communication, (Mesra, India), Birla Institute of Technology, IEEE Press, Feb. 2011.

    Google Scholar 

  3. M.Schubiger, English intonation: its form and function. Tubingen, Germany: Niemeyer, 1958.

    Google Scholar 

  4. J. Connor and G.Arnold, Intonation of Colloquial English. London, UK: Longman, second ed., 1973.

    Google Scholar 

  5. M. E. Ayadi, M. S.Kamel, and F. Karray, “Survey on speech emotion recognition: Features,classification schemes, and databases,” Pattern Recognition, vol. 44, pp. 572–587, 2011.

    Article  MATH  Google Scholar 

  6. P. Ekman, Handbook of Cognition and Emotion, ch. Basic Emotions. Sussex, UK: John Wiley and Sons Ltd, 1999.

    Google Scholar 

  7. R.Cowie, E.Douglas-Cowie, N.Tsapatsoulis, S.Kollias, W.Fellenz, and J.Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, pp. 32–80, 2001.

    Article  Google Scholar 

  8. J. William, “What is an emotion?,” Mind, vol. 9, p. 188–205, 1984.

    Google Scholar 

  9. A. D. Craig, Handbook of Emotion, ch. Interoception and emotion: A neuroanatomical perspective. New York: The Guildford Press, September 2009. ISBN 978-1-59385-650-2.

    Google Scholar 

  10. C. E. Williams and K. N. Stevens, “Vocal correlates of emotional states,” Speech Evaluation in Psychiatry, p. 189–220., 1981. Grune and Stratton Inc.

    Google Scholar 

  11. J.Cahn, “The generation of affect in synthesized speech,” Journal of American Voice Input/Output Society, vol. 8, pp. 1–19, 1990.

    Google Scholar 

  12. G. M. David, “Theories of emotion,” Psychology, vol. 7, 2004. New York, worth publishers.

    Google Scholar 

  13. X. Jin and Z. Wang, “An emotion space model for recognition of emotions in spoken chinese,” in ACII (J. Tao, T. Tan, and R. Picard, eds.), pp. 397–402, LNCS 3784, Springer-Verlag Berlin Heidelberg, 2005.

    Google Scholar 

  14. J. Makhoul, “Linear prediction: A tutorial review,” Proceedings of the IEEE, vol. 63, no. 4, pp. 561–580, 1975.

    Article  Google Scholar 

  15. L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, New Jersy: Prentice-Hall, 1993.

    Google Scholar 

  16. J. Benesty, M. M. Sondhi, and Y. Huang, eds., Springer Handbook on Speech Processing. Springer Publishers, 2008.

    Google Scholar 

  17. S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech using source, system and prosodic features,” International Journal of Speech Technology, Springer, vol. 15, no. 3, pp. 265–289, 2012.

    Article  Google Scholar 

  18. M. Schroder, R. Cowie, E. Douglas-Cowie, M. Westerdijk, and S. Gielen, “Acoustic correlates of emotion dimensions in view of speech synthesis,” (Aalborg, Denmark), EUROSPEECH 2001 Scandinavia, 2nd INTERSPEECH Event, September 3–7 2001. 7th European Conference on Speech Communication and Technology.

    Google Scholar 

  19. C.Williams and K.Stevens, “Emotionsandspeech:someacousticalcorrelates,” Journal of Acoustic Society of America, vol. 52, no. 4 pt 2, pp. 1238–1250, 1972.

    Article  Google Scholar 

  20. A. Batliner, J. Buckow, H. Niemann, E. Nöth, and VolkerWarnke, Verbmobile Foundations of speech to speech translation. ISBN 3540677836, 9783540677833: springer, 2000.

    Google Scholar 

  21. D. Ververidis and C. Kotropoulos, “Emotional speech recognition: Resources, features, and methods,” SPC, vol. 48, p. 1162–1181, 2006.

    Google Scholar 

  22. F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, “A database of german emotional speech,” in Interspeech, 2005.

    Google Scholar 

  23. S. G. Koolagudi, S. Maity, V. A. Kumar, S. Chakrabarti, and K. S. Rao, IITKGP-SESC : Speech Database for Emotion Analysis. Communications in Computer and Information Science, JIIT University, Noida, India: Springer, issn: 1865-0929 ed., August 17–19 2009.

    Google Scholar 

  24. E. McMahon, R. Cowie, S. Kasderidis, J. Taylor, and S. Kollias, “What chance that a dc could recognize hazardous mental states from sensor inputs?,” in Tales of the disappearing computer, (Santorini , Greece), 2003.

    Google Scholar 

  25. C. M. Lee and S. S. Narayanan, “Toward detecting emotions in spoken dialogs,” IEEE Trans. Speech and Audio Processing, vol. 13, pp. 293–303, March 2005.

    Article  Google Scholar 

  26. B. Schuller, G. Rigoll, and M. Lang, “Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ’04), (ISBN: 0-7803-8484-9), pp. I– 577–80, IEEE Press, May 17–21 2004.

    Google Scholar 

  27. F. Dellert, T. Polzin, and A. Waibel, “Recognizing emotion in speech,” (Philadelphia, PA, USA), pp. 1970–1973, 4th International Conference on Spoken Language Processing, October 3–6 1996.

    Google Scholar 

  28. R. Nakatsu, J. Nicholson, and N. Tosa, “Emotion recognition and its application to computer agents with spontaneous interactive capabilities,” Knowledge-Based Systems, vol. 13, pp. 497–504, December 2000.

    Article  Google Scholar 

  29. F. Charles, D. Pizzi, M. Cavazza, T. Vogt, and E. André, “Emoemma: Emotional speech input for interactive storytelling,” in 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009) (Decker, Sichman, Sierra, and Castelfranchi, eds.), (Budapest, Hungary), pp. 1381–1382, International Foundation for Autonomous Agents and Multi-agent Systems, May, 10–15 2009.

    Google Scholar 

  30. T.V.Sagar, “Characterisation and synthesis of emotionsin speech using prosodic features,” Master’s thesis, Dept. of Electronics and communications Engineering, Indian Institute of Technology Guwahati, May. 2007.

    Google Scholar 

  31. D.J.France, R.G.Shiavi, S.Silverman, M.Silverman, and M.Wilkes, “Acoustical properties of speech as indicators of depression and suicidal risk,” IEEE Transactions on Biomedical Eng, vol. 47, no. 7, pp. 829–837, 2000.

    Article  Google Scholar 

  32. P.-Y. Oudeyer, “The production and recognition of emotions in speech: features and algorithms,” International Journal of Human Computer Studies, vol. 59, p. 157–183, 2003.

    Article  Google Scholar 

  33. J.Hansen and D.Cairns, “Icarus: source generator based real-time recognition of speech in noisy stressful and lombard effect environments,” Speech Communication, vol. 16, no. 4, pp. 391–422, 1995.

    Article  Google Scholar 

  34. M. Schroder and R. Cowie, “Issues in emotion-oriented computing – toward a shared understanding,” in Workshop on Emotion and Computing, 2006. HUMAINE.

    Google Scholar 

  35. S. G. Koolagudi and K. S. Rao, “Real life emotion classification using vop and pitch based spectral features,” in INDICON-2010, (KOLKATA-700032, INDIA), Jadavpur University, December 2010.

    Google Scholar 

  36. H. Wakita, “Residual energy of linear prediction to vowel and speaker recognition,” IEEE Trans. Acoust. Speech Signal Process, vol. 24, pp. 270–271, 1976.

    Article  Google Scholar 

  37. K. S. Rao, S. R. M. Prasanna, and B. Yegnanarayana, “Determination of instants of significant excitation in speech using hilbert envelope and group delay function,” IEEE Signal Processing Letters, vol. 14, pp. 762–765, October 2007.

    Article  Google Scholar 

  38. A. Bajpai and B. Yegnanarayana, “Exploring features for audio clip classification using lp residual and aann models,” (Chennai, India), pp. 305–310, The international Conference on Intelligent Sensing and Information Processing 2004 (ICISIP 2004), January, 4–7 2004.

    Google Scholar 

  39. B. Yegnanarayana, R. K. Swamy, and K.S.R.Murty, “Determining mixing parameters from multispeaker data using speech-specific information,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1196–1207, 2009. ISSN 1558–7916.

    Google Scholar 

  40. G. Seshadri and B. Yegnanarayana, “Perceived loudness of speech based on the characteristics of glottal excitation source,” Journal of Acoustic Society of America, vol. 126, p. 2061–2071, October 2009.

    Article  Google Scholar 

  41. K. E. Cummings and M. A. Clements, “Analysis of the glottal excitation of emotionally styled and stressed speech,” Journal of Acoustic Society of America, vol. 98, pp. 88–98, July 1995.

    Article  Google Scholar 

  42. L. Z. Hua and H. Y. andf Wang Ren Hua, “A novel source analysis method by matching spectral characters of lf model with straight spectrum.” Springer-Verlag, Berlin, Heidelberg, 2005. 441–448.

    Google Scholar 

  43. D. O’Shaughnessy, Speech Communication Human and Mechine. Addison-Wesley publishing company, 1987.

    Google Scholar 

  44. M. Schröder, “Emotional speech synthesis: A review,” in 7th European Conference on Speech Communication and Technology, (Aalborg, Denmark), EUROSPEECH 2001 Scandinavia, September 3–7 2001.

    Google Scholar 

  45. S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech : A review,” International Journal of Speech Technology, Springer.

    Google Scholar 

  46. E. Douglas-Cowie, N. Campbell, R. Cowie, and P. Roach, “Emotional speech: Towards a new generation of databases,” SPC, vol. 40, p. 33–60, 2003.

    MATH  Google Scholar 

  47. The 15th Oriental COCOSDA Conference, December 9–12, 2012, Macau, China. (http://www.ococosda2012.org/)

  48. D. C. Ambrus, “Collecting and recording of an emotional speech database,” tech. rep., Faculty of Electrical Engineering, Institute of Electronics, Univ. of Maribor, 2000.

    Google Scholar 

  49. M. Alpert, E. R. Pouget, and R. R. Silva, “Reflections of depression in acoustic measures of the patient’s speech,” Journal of Affect Disord., vol. 66, pp. 59–69, September 2001.

    Article  Google Scholar 

  50. A. Batliner, C. Hacker, S. Steidl, E. Noth, D. S. Archy, M. Russell, and M. Wong, “You stupid tin box – children interacting with the aibo robot: a cross-linguistic emotional speech corpus.,” in Proc. Language Resources and Evaluation (LREC ’04), (Lisbon), 2004.

    Google Scholar 

  51. R. Cowie and E. Douglas-Cowie, “Automatic statistical analysis of the signal and prosodic signs of emotion in speech,” in Fourth International Conference on Spoken Language Processing (ICSLP ’96),, (Philadelphia, PA, USA), pp. 1989–1992, October 1996.

    Google Scholar 

  52. R. Cowie and R. R. Cornelius, “Describing the emotional states that are expressed in speech,” Speech Communication, vol. 40, pp. 5–32, Apr. 2003.

    Article  MATH  Google Scholar 

  53. M. Edgington, “Investigating the limitations of concatenative synthesis,” in European Conference on Speech Communication and Technology (Eurospeech ’97),, (Rhodes/Athens, Greece), pp. 593–596, 1997.

    Google Scholar 

  54. G. M. Gonzalez, “Bilingual computer-assisted psychological assessment: an innovative approach for screening depression in chicanos/latinos,” tech. report-39, Univ. Michigan, 1999.

    Google Scholar 

  55. C. Pereira, “Dimensions of emotional meaning in speech,” in Proc. ISCA Workshop on Speech and Emotion, (Belfast, Northern Ireland), pp. 25–28, 2000.

    Google Scholar 

  56. T. Polzin and A. Waibel, “Emotion sensitive human computer interfaces,” in ISCA Workshop on Speech and Emotion, Belfast, pp. 201–206, 2000.

    Google Scholar 

  57. M. Rahurkar and J. H. L. Hansen, “Frequency band analysis for stress detection using a teager energy operator based feature,” in Proc. international conf. on spoken language processing(ICSLP’02), pp. Vol.3, 2021–2024, 2002.

    Google Scholar 

  58. K. R. Scherer, D. Grandjean, L. T. Johnstone, and T. B. G. Klasmeyer, “Acoustic correlates of task load and stress,” in International Conference on Spoken Language Processing (ICSLP ’02), (Colorado), pp. 2017–2020, 2002.

    Google Scholar 

  59. M. Slaney and G. McRoberts, “Babyears: A recognition system for affective vocalizations,” Speech Communication, vol. 39, p. 367–384, February 2003.

    Article  MATH  Google Scholar 

  60. S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, C. Busso, Z. Deng., S. Lee, and S. Narayanan, “An acoustic study of emotions expressed in speech,” (Jeju island, Korean), International Conference on Spoken Language Processing (ICSLP 2004), October 2004.

    Google Scholar 

  61. F. Burkhardt and W. F. Sendlmeier, “Verification of acousical correlates of emotional speech using formant-synthesis,” (Newcastle, Northern Ireland, UK), pp. 151–156, ITRW on Speech and Emotion, September 5–7 2000.

    Google Scholar 

  62. A. Batliner, S. Biersacky, and S. Steidl, “The prosody of pet robot directed speech: Evidence from children,” in Speech Prosody 2006, (Dresden), pp. 1–4, 2006.

    Google Scholar 

  63. M. Schroder and M. Grice, “Expressing vocal effort in concatenative synthesis,” in International Conference on Phonetic Sciences (ICPhS ’03), (Barcelona), 2003.

    Google Scholar 

  64. M. Schroder, “Experimental study of affect bursts,” Speech Communication - Special issue on speech and emotion, vol. 40, no. 1–2, 2003.

    Google Scholar 

  65. M. Grimm, K. Kroschel, and S. Narayanan, “The vera am mittag german audio-visual emotional speech database,” in IEEE International Conference Multimedia and Expo, (Hannover), pp. 865–868, April 2008. DOI: 10.1109/ICME.2008.4607572.

    Google Scholar 

  66. C. H. Wu, Z. J. Chuang, and Y. C. Lin, “Emotion recognition from text using semantic labels and separable mixture models,” ACM Transactions on Asian Language Information Processing (TALIP) TALIP, vol. 5, pp. 165–182, June 2006.

    Google Scholar 

  67. T. L. Nwe, S. W. Foo, and L. C. D. Silva, “Speech emotion recognition using hidden Markov models,” Speech Communication, vol. 41, pp. 603–623, Nov. 2003.

    Article  Google Scholar 

  68. F. Yu, E. Chang, Y. Q. Xu, and H. Y. Shum, “Emotion detection from speech to enrich multimedia content,” in Proc. IEEE Pacific Rim Conference on Multimedia, (Beijing), Vol.1 pp. 550–557, 2001.

    Google Scholar 

  69. J. Yuan, L. Shen, and F. Chen, “The acoustic realization of anger, fear, joy and sadness in chinese,” in International Conference on Spoken Language Processing (ICSLP ’02),, (Denver, Colorado, USA), pp. 2025–2028, September 2002.

    Google Scholar 

  70. I. Iriondo, R. Guaus, A. Rodríguez, P. Lázaro, N. Montoya, J. M. Blanco, D. Bernadas, J.M. Oliver, D. Tena, and L. Longhi, “Validation of an acoustical modeling of emotional expression in spanish using speech synthesis techniques,” in ITRW on Speech and Emotion, (NewCastle, Northern Ireland, UK), September 2000. ISCA Archive.

    Google Scholar 

  71. J. M. Montro, J. Gutterrez-Arriola, J. Colas, E. Enriquez, and J. M. Pardo, “Analysis and modeling of emotional speech in spanish,” in Proc. Int.Conf. on Phonetic Sciences, pp.957–960, 1999.

    Google Scholar 

  72. A. Iida, N. Campbell, F. Higuchi, and M. Yasumura, “A corpus-based speech synthesis system with emotion,” Speech Communication, vol. 40, pp. 161–187, Apr. 2003.

    Article  MATH  Google Scholar 

  73. V. Makarova and V. A. Petrushin, “Ruslana: A database of russian emotional utterances,” in International Conference on Spoken Language Processing (ICSLP ’02),, pp. 2041–2044, 2002.

    Google Scholar 

  74. M. Nordstrand, G. Svanfeldt, B. Granstrom, and D. House, “Measurements of ariculatory variation in expressive speech for a set of swedish vowels,” Speech Communication, vol. 44, pp. 187–196, September 2004.

    Article  Google Scholar 

  75. E. M. Caldognetto, P. Cosi, C. Drioli, G. Tisato, and F. Cavicchio, “Modifications of phonetic labial targets in emotive speech: effects of the co-production of speech and emotions,” Speech Communication, vol. 44, no. 1–4, pp. 173–185, 2004.

    Article  Google Scholar 

  76. J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, pp. 561–580, Apr. 1975.

    Article  Google Scholar 

  77. S. R. M. Kodukula, Significance of Excitation Source Information for Speech Analysis. PhD thesis, Dept. of Computer Science, IIT, Madras, March 2009.

    Google Scholar 

  78. T. V. Ananthapadmanabha and B. Yegnanarayana, “Epoch extraction from linear prediction residual for identification of closed glottis interval,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, pp. 309–319, Aug. 1979.

    Google Scholar 

  79. B.Yegnanarayana, S.R.M.Prasanna, and K. Rao, “Speech enhancement using excitation source information,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 1, (Orlando, Florida, USA), pp. 541–544, May 2002.

    Google Scholar 

  80. A. Bajpai and B.Yegnanarayana, “Combining evidence from sub-segmental and segmental features for audio clip classification,” in IEEE Region 10 Conference (TENCON), (India), pp. 1–5, IIIT, Hyderabad, Nov. 2008.

    Google Scholar 

  81. B. S. Atal, “Automatic speaker recognition based on pitch contours,” Journal of Acoustic Society of America, vol. 52, no. 6, pp. 1687–1697, 1972.

    Article  Google Scholar 

  82. P. Thevenaz and H. Hugli, “Usefulness of lpc residue in textindependent speaker verification,” Speech Communication, vol. 17, pp. 145–157, 1995.

    Article  Google Scholar 

  83. J. H. L. Liu and G. Palm, “On the use of features from prediction residual signal in speaker recognition,” pp. 313–316, Proc. European Conf. Speech Processing, Technology (EUROSPEECH), 1997.

    Google Scholar 

  84. B. Yegnanarayana, P. S. Murthy, C. Avendano, and H. Hermansky, “Enhancement of reverberant speech using lp residual,” in IEEE International Conference on Acoustics, Speech and Signal Processing, (Seattle, WA , USA), pp. 405–408 vol.1, IEEE Xplore, May 1998. DOI:10.1109/ICASSP.1998.674453.

    Google Scholar 

  85. K. S. Kumar, M. S. H. Reddy, K. S. R. Murty, and B. Yegnanarayana, “Analysis of laugh signals for detecting in continuous speech,” (Brighton, UK), pp. 1591–1594, INTERSPEECH, September, 6–10 2009.

    Google Scholar 

  86. G. Bapineedu, B. Avinash, S. V. Gangashetty, and B. Yegnanarayana, “Analysis of lombard speech using excitation source information,” (Brighton, UK), pp. 1091–1094, INTERSPEECH, September, 6–10 2009.

    Google Scholar 

  87. O. M. Mubarak, E. Ambikairajah, and J. Epps, “Analysis of an mfcc-based audio indexing system for efficient coding of multimedia sources,” in The 8th International Symposium on Signal Processing and its Applications, (Sydney, Australia), 28–31 August 2005.

    Google Scholar 

  88. T. L. Pao, Y. T. Chen, J. H. Yeh, and W. Y. Liao, “Combining acoustic features for improved emotion recognition in mandarin speech,” in ACII (J. Tao, T. Tan, and R. Picard, eds.), (LNCS 3784), pp. 279–285, ©Springer-Verlag Berlin Heidelberg, 2005.

    Google Scholar 

  89. T. L. Pao, Y. T. Chen, J. H. Yeh, Y. M. Cheng, and C. S. Chien, Feature Combination for Better Differentiating Anger from Neutral in Mandarin Emotional Speech. LNCS 4738, ACII 2007: Springer-Verlag Berlin Heidelberg, 2007.

    Google Scholar 

  90. N. Kamaruddin and A. Wahab, “Features extraction for speech emotion,” Journal of Computational Methods in Science and Engineering, vol. 9, no. 9, pp. 1–12, 2009. ISSN:1472–7978 (Print) 1875–8983 (Online).

    Google Scholar 

  91. D. Neiberg, K. Elenius, and K. Laskowski, “Emotion recognition in spontaneous speech using GMMs,” in INTERSPEECH 2006 - ICSLP, (Pittsburgh, Pennsylvania), pp. 809–812, 17–19 September 2006.

    Google Scholar 

  92. D. Bitouk, R. Verma, and A. Nenkova, “Class-level spectral features for emotion recognition,” Speech Communication, 2010. Article in press.

    Google Scholar 

  93. M. Sigmund, “Spectral analysis of speech under stress,” IJCSNS International Journal of Computer Science and Network Security, vol. 7, pp. 170–172, April 2007.

    Google Scholar 

  94. K. S. Rao and B. Yegnanarayana, “Prosody modification using instants of significant excitation,” IEEE Trans. Speech and Audio Processing, vol. 14, pp. 972–980, May 2006.

    Article  Google Scholar 

  95. S. Werner and E. Keller, “Prosodic aspects of speech,” in Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art, the Future Challenges (E. Keller, ed.), pp. 23–40, Chichester: John Wiley, 1994.

    Google Scholar 

  96. T. Banziger and K. R. Scherer, “The role of intonation in emotional expressions,” Speech Communication, no. 46, pp. 252–267, 2005.

    Google Scholar 

  97. R. Cowie and R. R. Cornelius, “Describing the emotional states that are expressed in speech,” Speech Communication, vol. 40, pp. 5–32, Apr. 2003.

    Article  MATH  Google Scholar 

  98. F. Dellaert, T. Polzin, and A. Waibel, “Recognising emotions in speech,” ICSLP 96, Oct. 1996.

    Google Scholar 

  99. M. Schroder, “Emoptional speech synthesis: A review,” (Seventh european conference on speech communication and technology Aalborg, Denmark), Eurospeech 2001, Sept. 2001.

    Google Scholar 

  100. I. R. Murray and J. L. Arnott, “Implementation and testing of a system for producing emotion by rule in synthetic speech,” Speech Communication, vol. 16, pp. 369–390, 1995.

    Article  Google Scholar 

  101. J. E. Cahn, “The generation of affect in synthesized speech,” JAVIOS, pp. 1–19, Jul. 1990.

    Google Scholar 

  102. I. R. Murray, J. L. Arnott, and E. A. Rohwer, “Emotional stress in synthetic speech: Progress and future directions,” Speech Communication, vol. 20, pp. 85–91, Nov. 1996.

    Article  Google Scholar 

  103. K. R. Scherer, “Vocal communication of emotion: A review of research paradigms,” Speech Communication, vol. 40, pp. 227–256, 2003.

    Article  MATH  Google Scholar 

  104. S. McGilloway, R. Cowie, E. Douglas-Cowie, S. Gielen, M. Westerdijk, and S. Stroeve, “Approaching automatic recognition of emotion from voice: A rough benchmark,” (Belfast), 2000.

    Google Scholar 

  105. I. Luengo, E. Navas, I. Hernáez, and J. Sánchez, “Automatic emotion recognition using prosodic parameters,” in INTERSPEECH, (Lisbon, Portugal), pp. 493–496, IEEE, September 2005.

    Google Scholar 

  106. T. Iliou and C.-N. Anagnostopoulos, “Statistical evaluation of speech features for emotion recognition,” in Fourth International Conference on Digital Telecommunications, (Colmar, France), pp. 121–126, July 2009. ISBN: 978-0-7695-3695-8.

    Google Scholar 

  107. Y. hao Kao and L. shan Lee, “Feature analysis for emotion recognition from mandarin speech considering the special characteristics of chinese language,” in INTERSPEECH -ICSLP, (Pittsburgh, Pennsylvania), pp. 1814–1817, September 2006.

    Google Scholar 

  108. A. Zhu and Q. Luo, “Study on speech emotion recognition system in e learning,” in Human Computer Interaction, Part III, HCII (J. Jacko, ed.), (Berlin Heidelberg), pp. 544–552, Springer Verlag, 2007. LNCS:4552, DOI: 10.1007/978-3-540-73110-8-59.

    Google Scholar 

  109. M. Lugger and B. Yang, “The relevance of voice quality features in speaker independent emotion recognition,” in ICASSP, (Honolulu, Hawai, USA), pp. IV17–IV20, IEEE, May 2007.

    Google Scholar 

  110. Y. Wang, S. Du, and Y. Zhan, “Adaptive and optimal classification of speech emotion recognition,” in Fourth International Conference on Natural Computation, pp. 407–411, October 2008. http://doi.ieeecomputersociety.org/10.1109/ICNC.2008.713.

  111. S. Zhang, “Emotion recognition in chinese natural speech by combining prosody and voice quality features,” in Advances in Neural Networks, Lecture Notes in Computer Science, Volume 5264 (S. et al., ed.), (Berlin Heidelberg), pp. 457–464, Springer Verlag, 2008. DOI: 10.1007/978-3-540-87734-9-52.

    Google Scholar 

  112. D. Ververidis, C. Kotropoulos, and I. Pitas, “Automatic emotional speech classification,” pp. I593–I596, ICASSP 2004, IEEE, 2004.

    Google Scholar 

  113. K. S. Rao, R. Reddy, S. Maity, and S. G. Koolagudi, “Characterization of emotions using the dynamics of prosodic features,” in International Conference on Speech Prosody, (Chicago, USA), May 2010.

    Google Scholar 

  114. K. S. Rao, S. R. M. Prasanna, and T. V. Sagar, “Emotion recognition using multilevel prosodic information,” in Workshop on Image and Signal Processing (WISP-2007), (Guwahati, India), IIT Guwahati, Guwahati, December 2007.

    Google Scholar 

  115. Y.Wang and L.Guan, “An investigation of speech-based human emotion recognition,” in IEEE 6th Workshop on Multimedia Signal Processing, pp. 15–18, IEEE press, October 2004.

    Google Scholar 

  116. Y. Zhou, Y. Sun, J. Zhang, and Y. Yan, “Speech emotion recognition using both spectral and prosodic features,” in International Conference on Information Engineering and Computer Science, ICIECS, (Wuhan), pp. 1–4, IEEE press, 19–20 Dec. 2009. DOI: 10.1109/ICIECS.2009.5362730.

    Google Scholar 

  117. C. E. X. Y. Yu, F. and H. Shum, “Emotion detection from speech to enrich multimedia content,” in Second IEEE Pacific-Rim Conference on Multimedia, (Beijing, China), October 2001.

    Google Scholar 

  118. V.Petrushin, Emotion in speech: Recognition and application to call centres. Artifi.Neu.Net. Engr.(ANNIE), 1999.

    Google Scholar 

  119. R. Nakatsu, J. Nicholson, and N. Tosa, “Emotion recognition and its application to computer agents with spontaneous interactive capabilities,” Knowledge Based Systems, vol. 13, pp.497–504, 2000.

    Article  Google Scholar 

  120. J. Nicholson, K. Takahashi, and R.Nakatsu, “Emotion recognition in speech using neural networks,” Neural computing and applications, vol. 11, pp. 290–296, 2000.

    Article  Google Scholar 

  121. R. Tato, R. Santos, R. Kompe1, and J. Pardo, “Emotional space improves emotion recognition,” (Denver, Colorado, USA), 7th International Conference on Spoken Language Processing, September 16–20 2002.

    Google Scholar 

  122. R. Fernandez and R. W. Picard, “Modeling drivers’ speech under stress,” Speech Communication, vol. 40, p. 145–159, 2003.

    Article  MATH  Google Scholar 

  123. V. A. Petrushin, “Emotion in speech : Recognition and application to call centers,” Proceedings of the 1999 Conference on Artificial Neural Networks in Engineering (ANNIE ’99), 1999.

    Google Scholar 

  124. J. Nicholson, K. Takahashi, and R.Nakatsu, “Emotion recognition in speech using neural networks,” in 6th International Conference on Neural Information Processing, (Perth, WA, Australia), pp. 495–501, ICONIP-99, August 1999. 10.1109/ICONIP.1999.845644.

    Google Scholar 

  125. V. A. Petrushin, “Emotion recognition in speech signal: Experimental study, development and application,” in ICSLP, (Beijing, China), 2000.

    Google Scholar 

  126. C. M. Lee, S. Narayanan, and R. Pieraccini, “Recognition of negative emotion in the human speech signals,” in Workshop on Auto. Speech Recognition and Understanding, December 2001.

    Google Scholar 

  127. G. Zhou, J. H. L. Hansen, and J. F. Kaiser, “Nonlinear feature based classification of speech under stress,” IEEE Trans. Speech and Audio Processing, vol. 9, pp. 201–216, March 2001.

    Article  Google Scholar 

  128. K. S. Rao and S. G. Koolagudi, “Characterization and recognition of emotions from speech using excitation source information,” International Journal of Speech Technology, Springer. DOI 10.1007/s10772-012-9175-2.

    Google Scholar 

  129. K. S. R. Murty and B. Yegnanarayana, “Combining evidence from residual phase and mfcc features for speaker recognition,” IEEE SIGNAL PROCESSING LETTERS, vol. 13, pp.52–55, January 2006.

    Article  Google Scholar 

  130. K. Murty and B. Yegnanarayana, “Epoch extraction from speech signals,” IEEE Trans. Audio, Speech, and Language Processing, vol. 16, pp. 1602–1613, 2008.

    Google Scholar 

  131. B. Yegnanarayana, Artificial Neural Networks. New Delhi, India: Prentice-Hall, 1999.

    Google Scholar 

  132. S. Haykin, Neural Networks: A Comprehensive Foundation. New Delhi, India: Pearson Education Aisa, Inc., 1999.

    MATH  Google Scholar 

  133. K. S. Rao, “Role of neural network models for developing speech systems,” Sadhana, Academy Proceedings in Engineering Sciences, Indian Academy of Sciences, Springer, vol. 36, pp. 783–836, Oct. 2011.

    Google Scholar 

  134. R. H. Laskar, D. Chakrabarty, F. A. Talukdar, K. S. Rao, and K. Banerjee, “Comparing ANN and GMM in a voice conversion framework,” Applied Soft Computing,Elsevier, vol. 12, pp. 3332–3342, Nov. 2012.

    Google Scholar 

  135. K. I. Diamantaras and S. Y. Kung, Principal Component Neural Networks: Theory and Applications. Newyork: John Wiley and Sons, 1996.

    MATH  Google Scholar 

  136. M. S. Ikbal, H. Misra, and B. Yegnanarayana, “Analysis of autoassociative mapping neural networks,” (USA), pp. 854–858, Proc. Internat. Joint Conf. on Neural Networks (IJCNN), 1999.

    Google Scholar 

  137. S. P. Kishore and B. Yegnanarayana, “Online text-independent speaker verification system using autoassociative neural network models,” (Washington, DC, USA.), pp. 1548–1553 (V2), Proc. Internat. Joint Conf. on Neural Networks (IJCNN), August 2001.

    Google Scholar 

  138. A. V. N. S. Anjani, “Autoassociate neural network models for processing degraded speech,” Master’s thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2000.

    Google Scholar 

  139. K. S. Reddy, “Source and system features for speaker recognition,” Master’s thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2004.

    Google Scholar 

  140. C. S. Gupta, “Significance of source features for speaker recognition,” Master’s thesis, MS thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2003.

    Google Scholar 

  141. S. Desai, A. W. Black, B.Yegnanarayana, and K. Prahallad, “Spectral mapping using artificial neural networks for voice conversion,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, pp. 954–964, 8 Apr. 2010.

    Google Scholar 

  142. K. S. Rao and B. Yegnanarayana, “Intonation modeling for indian languages,” Computer Speech and Language, vol. 23, pp. 240–256, April 2009.

    Article  Google Scholar 

  143. C. K. Mohan and B. Yegnanarayana, “Classification of sport videos using edge-based features and autoassociative neural network models,” Signal, Image and Video Processing, vol. 4, pp. 61–73, 15 Nov. 2008. DOI: 10.1007/s11760-008-0097-9.

    Google Scholar 

  144. L. Mary and B. Yegnanarayana, “Autoassociative neural network models for language identification,” in International Conference on Intelligent Sensing and Information Processing, pp. 317–320, IEEE, 24 Aug. 2004. DOI:10.1109/ICISIP.2004.1287674.

    Google Scholar 

  145. K. S. Rao, J. Yadav, S. Sarkar, S. G. Koolagudi, and A. K. Vuppala, “Neural network based feature transformation for emotion independent speaker identification,” International Journal of Speech Technology, Springer, vol. 15, no. 3, pp. 335–349, 2012.

    Article  Google Scholar 

  146. B. Yegnanarayana, K. S. Reddy, and S. P. Kishore, “Source and system features for speaker recognition using aann models,” (Salt Lake City, UT), IEEE Int. Conf. Acoust., Speech, and Signal Processing, May 2001.

    Google Scholar 

  147. C. S. Gupta, S. R. M. Prasanna, and B. Yegnanarayana, “Autoassociative neural network models for online speaker verification using source features from vowels,” in Int. Joint Conf. Neural Networks, (Honululu, Hawii, USA), May 2002.

    Google Scholar 

  148. B. Yegnanarayana, K. S. Reddy, and S. P. Kishore, “Source and system features for speaker recognition using AANN models,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Salt Lake City, Utah, USA), pp. 409–412, May 2001.

    Google Scholar 

  149. S. Theodoridis and K. Koutroumbas, Pattern Recognition. USA: Elsevier, Academic press, 3 ed., 2006.

    Google Scholar 

  150. K. S. Rao, Acquisition and incorporation prosody knowledge for speech systems in Indian languages. PhD thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May 2005.

    Google Scholar 

  151. S. R. M. Prasanna, B. V. S. Reddy, and P. Krishnamoorthy, “Vowel onset point detection using source, spectral peaks, and modulation spectrum energies,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 556–565, March 2009.

    Google Scholar 

  152. S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features,” International Journal of Speech Technology, Springer. DOI 10.1007/s10772-012-9150-8.

    Google Scholar 

  153. J. Chen, Y. A. Huang, Q. Li, and K. K. Paliwal, “Recognition of noisy speech using dynamic spectral subband centroids,” IEEE signal processing letters, vol. 11, pp. 258–261, February 2004.

    Article  Google Scholar 

  154. B. Yegnanarayana and S. P. Kishore, “AANN an alternative to GMM for pattern recognition,” Neural Networks, vol. 15, pp. 459–469, Apr. 2002.

    Article  Google Scholar 

  155. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. Singapore: A Wiley-interscience Publications, 2 ed., 2004.

    Google Scholar 

  156. S. R. M. Prasanna, B. V. S. Reddy, and P. Krishnamoorthy, “Vowel onset point detection using source, spectral peaks, and modulation spectrum energies,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 556–565, May 2009.

    Google Scholar 

  157. Unicode Entity Codes for the Telugu Script, Accents, Symbols and Foreign Scripts, Penn State University, USA. (http://tlt.its.psu.edu/suggestions/international/bylanguage/teluguchart.html)

  158. K. S. Rao, Predicting Prosody from Text for Text-to-Speech Synthesis. ISBN-13: 978-1461413370, Springer, 2012.

    Google Scholar 

  159. K. S. Rao and S. G. Koolagudi, “Selection of suitable features for modeling the durations of syllables,” Journal of Software Engineering and Applications, vol. 3, pp. 1107–1117, Dec. 2010.

    Article  Google Scholar 

  160. K. S. Rao, “Application of prosody models for developing speech systems in indian languages,” International Journal of Speech Technology, Springer, vol. 14, pp. 19–33, 2011.

    Google Scholar 

  161. N. P. Narendra, K. S. Rao, K. Ghosh, R. R. Vempada, and S. Maity, “Development of syllable-based text-to-speech synthesis system in bengali,” International Journal of Speech Technology, Springer, vol. 14, no. 3, pp. 167–181, 2011.

    Article  Google Scholar 

  162. K. S. Rao, S. G. Koolagudi, and R. R. Vempada, “Emotion recognition from speech using global and local prosodic features,” International Journal of Speech Technology, Springer, Aug. 2012. DOI: 10.1007/s10772-012-9172-2.

    Google Scholar 

  163. L. R. Rabiner, Digital Signal Processing. IEEE Press, 1972.

    Google Scholar 

  164. B. S. Atal and S. L. Hanauer, “Speech analysis and synthesis by linear prediction of the speech wave,” J. Acoust. Soc. Am., vol. 50, pp. 637–655, Aug. 1971.

    Article  Google Scholar 

  165. J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, pp. 561–580, Apr. 1975.

    Article  Google Scholar 

  166. B. S. Atal and M. R. Schroeder, “Linear prediction analysis of speech based on a pole-zero representation,” J. Acoust. Soc. Am., vol. 64, no. 5, pp. 1310–1318, 1978.

    Article  Google Scholar 

  167. D. O’Shaughnessy, “Linear predictive coding,” IEEE Potentials, vol. 7, pp. 29–32, Feb. 1988.

    Article  Google Scholar 

  168. T. Ananthapadmanabha and B. Yegnanarayana, “Epoch extraction from linear prediction residual for identification of closed glottis interval,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, pp. 309–319, Aug. 1979.

    Google Scholar 

  169. J. Picone, “Signal modeling techniques in speech recognition,” Proc. IEEE, vol. 81, pp.1215–1247, Sep. 1993.

    Article  Google Scholar 

  170. J. W. Picone, “Signal modeling techniques in speech recognition,” Proceedings of IEEE, vol. 81, pp. 1215–1247, Sep. 1993.

    Article  Google Scholar 

  171. J. R. Deller, J. H. Hansen, and J. G. Proakis, Discrete Time Processing of Speech Signals. 1st ed. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1993.

    Google Scholar 

  172. J. Benesty, M. M. Sondhi, and Y. A. Huang, Springer Handbook of Speech Processing. Springer-Verlag New York, Inc., 2008.

    Google Scholar 

  173. J. Volkmann, S. Stevens, and E. Newman, “A scale for the measurement of the psychological magnitude pitch,” J. Acoust. Soc. Amer., vol. 8, pp. 185–190, Jan. 1937.

    Article  Google Scholar 

  174. Z. Fang, Z. Guoliang, and S. Zhanjiang, “Comparison of different implementations of MFCC,” J. Computer Science and Technology, vol. 16, no. 6, pp. 582–589, 2001.

    Article  MATH  Google Scholar 

  175. G. K. T. Ganchev and N. Fakotakis, “Comparative evaluation of various MFCC implementations on the speaker verification task,” in Proc. of Int. Conf. on Speech and Computer, (Patras, Greece), pp. 191–194, 2005.

    Google Scholar 

  176. S. Furui, “Comparison of speaker recognition methods using statistical features and dynamic features,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 3, pp. 342–350, 1981.

    Google Scholar 

  177. J. S. Mason and X. Zhang, “Velocity and acceleration features in speaker recognition,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Toronto, Canada), pp. 3673–3676, Apr. 1991.

    Google Scholar 

  178. D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication, vol. 17, pp. 91–108, Aug. 1995.

    Article  Google Scholar 

  179. F. Bimbot, J. F. Bonastre, C. Fredouille, G. Gravier, M. I. Chagnolleau, S. Meignier, T. Merlin, O. J. Garcia, D. Petrovska, and Reynolds, “A tutorial on text-independent speaker verification,” EURASIP Journal Applied Signal process, no. 4, pp. 430–451, 2004.

    Google Scholar 

  180. A. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal Royal Statistical Society, vol. 39, no. 1, pp. 1–38, 1977.

    MathSciNet  MATH  Google Scholar 

  181. Y. Linde, A. Buzo, and R. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Communications, vol. 28, pp. 84–95, Jan. 1980.

    Article  Google Scholar 

  182. J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability (L. M. L. Cam and J. Neyman, eds.), vol. 1, pp. 281–297, University of California Press, 1967.

    Google Scholar 

  183. J. A. Hartigan and M. A. Wong, “A K-means clustering algorithm,” Applied Statistics, vol. 28, no. 1, pp. 100–108, 1979.

    Article  MATH  Google Scholar 

  184. Q. Y. Hong and S. Kwong, “A discriminative training approach for text-independent speaker recognition,” Signal process., vol. 85, no. 7, pp. 1449–1463, 2005.

    Article  MATH  Google Scholar 

  185. D. Reynolds and R. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech Audio processeing, vol. 3, pp. 72–83, Jan. 1995.

    Article  Google Scholar 

  186. J. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech Audio process., vol. 2, pp.291–298, Apr. 1994.

    Article  Google Scholar 

  187. D. A. Reynolds, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Process., vol. 10, pp. 19–41, Jan. 2000.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Krothapalli, S.R., Koolagudi, S.G. (2013). Emotion Recognition Using Excitation Source Information. In: Emotion Recognition using Speech Features. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5143-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-5143-3_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-5142-6

  • Online ISBN: 978-1-4614-5143-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics