Cross-Lingual Vocal Emotion Recognition in Five Native Languages of Assam Using Eigenvalue Decomposition

  • Aditya Bihar Kandali
  • Aurobinda Routray
  • Tapan Kumar Basu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5909)


This work investigates whether vocal emotion expressions of full-blown discrete emotions can be recognized cross-lingually. This study will enable us to get more information regarding nature and function of emotion. Furthermore, this work will help in developing a generalized vocal emotion recognition system, which will increase the efficiency required for human-machine interaction systems. An emotional speech database was created with 140 simulated utterances (20 per emotion) per speaker, consisting of short sentences of six full-blown discrete basic emotions and one ’no-emotion’ (i.e. neutral) in five native languages (not dialects) of Assam. A new feature set is proposed based on Eigenvalues of Autocorrelation Matrix (EVAM) of each frame of utterance. The Gaussian Mixture Model is used as classifier. The performance of EVAM feature set is compared at two sampling frequencies (44.1 kHz and 8.1 kHz) and with additive white noise with signal-to-noise ratios of 0 db, 5 db, 10 db and 20 db.


Full-blown Basic Emotion Cross-lingual Vocal Emotion Recognition Gaussian Mixture Model Eigenvalues of Autocorrelation Matrix 


  1. [1]
    Holmes, J., Holmes, W.: Speech Synthesis and Recognition, 2nd edn. Taylor & Francis, New York (2001)Google Scholar
  2. [2]
    Rose, P.: Forensic Speaker Identification, p. 302. Taylor & Francis, New York (2002)Google Scholar
  3. [3]
    Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)CrossRefGoogle Scholar
  4. [4]
    Picard, R.W.: Affective Computing. The MIT Press, Cambridge (1997)Google Scholar
  5. [5]
    Juslin, P.N., Laukka, P.: Communication of Emotions in Vocal Expression and Music Performance. Psychological Bulletin 129(5), 770–814 (2003)CrossRefGoogle Scholar
  6. [6]
    Scherer, K.R., Banse, R., Wallbott, H.G.: Emotion Inferences from Vocal Expression Correlate Across Languages and Cultures. J. Cross-Cultural Psychology 32(1), 76–92 (2001)CrossRefGoogle Scholar
  7. [7]
    Laukka, P.: Vocal Expression of Emotion – Discrete-emotion and Dimensional Accounts. Comprehensive Summaries of Uppsala Dissertations from the Faculty of Social Sciences 141, ACTA Universitatis Upsaliensis, Uppsala (2004)Google Scholar
  8. [8]
    Scherer, K.R., Johnstone, T., Klasmeyer, G.: Vocal Expression of Emotion. In: Davidson, R.J., Scherer, K.R., Goldsmith, H.H. (eds.) Handbook of Affective Science, Part IV, ch. 23, 1st edn. Oxford University Press, Oxford (2003)Google Scholar
  9. [9]
    Ekman, P.: Basic Emotions. In: Dalgleish, T., Power, M. (eds.) Handbook of Cognition and Emotion, ch. 3. John Wiley & Sons, Ltd., Sussex (1999)Google Scholar
  10. [10]
    Marple Jr., S.L.: Digital Spectral Analysis With Applications. Prentice Hall Inc., Englewood Cliffs (1987)Google Scholar
  11. [11]
    Reynolds, D.A., Rose, R.C.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)CrossRefGoogle Scholar
  12. [12]
    Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Morgan Kaufmann, Academic Press, New York (1990)zbMATHGoogle Scholar
  13. [13]
    Linde, Y., Buzo, A., Gray, R.M.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28(1), 84–95 (1980)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Aditya Bihar Kandali
    • 1
  • Aurobinda Routray
    • 1
  • Tapan Kumar Basu
    • 2
  1. 1.Department of Electrical EngineeringIndian Institute of TechnologyKharagpurIndia
  2. 2.Aliah University, Salt Lake CityKolkotaIndia

Personalised recommendations