Analysis of Breathy, Emergency and Pathological Stress Classes

  • Amit AbhishekEmail author
  • Suman Deb
  • Samarendra Dandapat
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 748)


Recently, man–machine interaction based on speech recognition has taken an increasing interest in the field of speech processing. The need for machine to understand the human stress levels in a speaker-independent manner, to prioritize the situation, has grown rapidly. A number of databases have been used for stressed speech recognition. Majority of the databases contain styled emotions and Lombard speech. No studies have been reported on stressed speech considering other stress conditions like emergency, breathy, workload, sleep deprivation and pathological condition. In this work, a new stressed speech database is recorded by considering emergency, breathy and pathological conditions. The database is validated with statistical analysis using two features, mel-frequency cepstral coefficient (MFCC) and Fourier parameter (FP). The results show that these recorded stress classes are effectively characterized by the features. A fivefold cross-validation is carried out to assess how the statistical analysis results are independent of the dataset. Support vector machine (SVM) is used to classify different stress classes.


Emotion Stress Breathy Emergency Pathological 



The author would like to thank all the speakers participated in the data recordings for IITG-Stress database.


  1. 1.
    Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am. 55(6), 1304–1312 (1974)CrossRefGoogle Scholar
  2. 2.
    Ayadi, M.E., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features; classification schemes; and databases. Pattern Recogn. 44(3), 572–587 (2011)CrossRefGoogle Scholar
  3. 3.
    Bou-Ghazale, S.E., Hansen, J.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)CrossRefGoogle Scholar
  4. 4.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech, Lissabon pp. 1517–1520 (2005)Google Scholar
  5. 5.
    Busso, C., Lee, S., Narayanan, S.: Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans. Audio Speech; Lang. Process 17(4), 582–596 (2009)Google Scholar
  6. 6.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. CM Trans. Intel. Syst. Technol. (TIST) 2(3), 27 (2011)Google Scholar
  7. 7.
    Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Feartype emotion recognition for future audio-based surveillance systems. Speech Commun. 50, 487–503 (2008)CrossRefGoogle Scholar
  8. 8.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRefGoogle Scholar
  9. 9.
    Deb, S., Dandapat, S.: Emotion classification using residual sinusoidal peak amplitude. In: 2016 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5, June 2016Google Scholar
  10. 10.
    Deb, S., Dandapat, S.: Classification of speech under stress using harmonic peak to energy ratio. Comput. Electr. Eng. 55, 12–23 (2016)CrossRefGoogle Scholar
  11. 11.
    Deb, S., Dandapat, S.: Fourier model based features for analysis and classification of out-of-breath speech. Speech Commun. 90, 1–14 (2017)CrossRefGoogle Scholar
  12. 12.
    Deb, S., Dandapat, S.: A novel breathiness feature for analysis and classification of speech under stress. In: 2015 Twenty First National Conference on Communications (NCC), pp. 1–5. IEEE (2015)Google Scholar
  13. 13.
    Deb, S., Dandapat, S.: Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans. Affect. Comput. (2017)Google Scholar
  14. 14.
    Gobl, C., Chasaide, A.N.: The role of voice quality in communicating emotion mood and attitude. Speech Commun. 40, 189–212 (2003)CrossRefGoogle Scholar
  15. 15.
    Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives based evaluation and estimation of emotions in speech. Speech Commun. 49, 787–800 (2007)CrossRefGoogle Scholar
  16. 16.
    Hansen, J.H.: Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Commun. 20(1), 151–173 (1996). Speech under StressGoogle Scholar
  17. 17.
    Hansen, J.H., Patil, S.: Speech under stress: Analysis, modeling and recognition. In: Speaker Classification I, pp. 108–137. Springer (2007)Google Scholar
  18. 18.
    Kamaruddina, N., Wahabb, A., Quek, C.: Cultural dependency analysis for understanding speech emotion. Expert Syst. Appl. 39, 5115–5133 (2012)CrossRefGoogle Scholar
  19. 19.
    Kotti, M., Paterno, F.: Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. Int. J. Speech Technol. 15, 131–150 (2012)CrossRefGoogle Scholar
  20. 20.
    Kustner, O., Tato, R., Kemp, T., Meffert, B.: Towards real life applications in emotion recognition. In: Proceedings of the Conference on Affective Dialogue Systems, pp. 25–35 (2004)Google Scholar
  21. 21.
    Li, X., Tao, J., Johnson, M.T., Soltis, J., Savage, A., Leong, K.M., Newman, J.D.: Stress and emotion classification using jitter and shimmer features. In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, vol. 4, pp. IV–1081–IV–1084 (2007)Google Scholar
  22. 22.
    Lugger, M., Yang, B.: Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In: Proceedings of the 17th European Signal Processing Conference, pp. 1225–1229 (2009)Google Scholar
  23. 23.
    McAulay, R.J., Quatieri, T.F.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust. Speech Signal Process. 34(4), 744–754 (1986)CrossRefGoogle Scholar
  24. 24.
    Ntalampiras, S., Potamitis, I., Fakotakis, N.: An adaptive framework for acoustic monitoring of potential hazards. EURASIP J. Audio Speech Music Process. 2009(13), 1–15 (2009)Google Scholar
  25. 25.
    Rabiner, L., Schafer, R.: Digital Processing of Speech Signals, 1st ed. Prentice Hall, Upper Saddle River, New Jersey 07458, USA (1978)Google Scholar
  26. 26.
    Ramamohan, S., Dandapat, S.: Sinusoidal model-based analysis and classification of stressed speech. IEEE Trans. Audio Speech Lang. Process. 14(3), 737–746 (2006)CrossRefGoogle Scholar
  27. 27.
    Shukla, S., Dandapat, S., Prasanna, S.R.: Spectral slope based analysis and classification of stressed speech. Int. J. Speech Technol. 14(3), 245–258 (2011)Google Scholar
  28. 28.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York, NY, USA (1998)Google Scholar
  29. 29.
    Wagner, J., Vogt, T., Andre, E.: A systematic comparison of different hmm designs for emotion recognition from acted and spontaneous speech. In: Proceedings of the 2nd International Conference on Affective Computing and Intelligent Interaction, vol. 4738, pp. 114–125 (2007)Google Scholar
  30. 30.
    Yang, B., Lugger, M.: Emotion recognition from speech signals using new harmony features. Signal Process. 90, 1415–1423 (2010)CrossRefGoogle Scholar
  31. 31.
    You, M.Y., Chen, C., Bu, J.J., Liu, J., Tao, J.H.: Emotion recognition from noisy speech. In: Proceedings of the IEEE International Conference on Multimedia Expo, pp. 1653–1656, Jul 2006Google Scholar
  32. 32.
    Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Electronics and Electrical EngineeringIndian Institute of TechnologyGuwahatiIndia

Personalised recommendations