Abstract
Automatic detection of emotional stress is an active research domain, which has recently drawn increasing attention, mainly in the fields of computer science, linguistics, and medicine. In this study, stress is automatically detected by employing speech-derived features. Related studies utilize features such as overall intensity, MFCCs, Teager Energy Operator, and pitch. The present study proposes a novel set of features based on the spectral tilt of the glottal source and of the speech signal itself. The proposed features rely on the Probability Density Function of the estimated spectral slopes, and consist of the three most probable slopes from the glottal source, as well as the corresponding three slopes of the speech signal, obtained on a word level. The performance of the proposed method is evaluated on the simulated dataset of the SUSAS corpus, achieving recognition accuracy of \(92.06\%\), when the Random Forests classifier is used.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Glottal source signal is the signal generated at the glottis which could be either periodic pulses or noise.
References
Sharma, N., Gedeon, T.: Objective measures, sensors and computational techniques for stress recognition and classification: A survey. Comput. Methods Programs Biomed. 108(3), 1287–1301 (2012)
Murray, I.R., Baber, C., South, A.: Towards a definition and working model of stress and its effects on speech. Speech Commun. 20(1), 3–12 (1996)
Selye, H.: The Stress of Life. McGraw-Hill, New York (1956)
Lefter, I., Rothkrantz, L.J., Van Leeuwen, D.A., Wiggers, P.: Automatic stress detection in emergency (telephone) calls. Int. J. Intell. Defence Support Syst. 4(2), 148–168 (2011)
Zhou, G.J., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001)
Garnier, M., Henrich, N.: Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise? Comput. Speech Lang. 28(2), 580–597 (2014)
Giannakakis, G., Pediaditis, M., Manousos, D., Kazantzaki, E., Chiarugi, F., Simos, P.G., Marias, K., Tsiknakis, M.: Stress and anxiety detection using facial cues from videos. Biomed. Signal Process. Control 31, 89–101 (2017)
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 39–58 (2009)
Godin, K.W., Hasan, T., Hansen, J.H.: Glottal waveform analysis of physical task stress speech. In: INTERSPEECH, pp. 1648–1651 (2012)
Sluijter, A.M., Van Heuven, V.J.: Spectral balance as an acoustic correlate of linguistic stress. J. Acoust. Soc. Am. 100(4), 2471–2485 (1996)
Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Eurospeech, vol. 97(4), pp. 1743–46 (1997)
Hansen, J.H., Kim, W., Rahurkar, M., Ruzanski, E., Meyerhoff, J.: Robust emotional stressed speech detection using weighted frequency subbands. EURASIP J. Adv. Signal Process. 2011(1), 1–10 (2011)
Shukla, S., Dandapat, S., Prasanna, S.R.M.: Spectral slope based analysis and classification of stressed speech. Int. J. Speech Technol. 14(3), 245–258 (2011)
Yao, X., Jitsuhiro, T., Miyajima, C., Kitaoka, N., Takeda, K.: Physical characteristics of vocal folds during speech under stress. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4609–4612 (2012)
Shah, F., Sukumar, R., Anto, B.: Automatic Stress Detection from Speech by Using Discrete Wavelet Transforms (2009)
Sondhi, S., Khan, M., Vijay, R., Salhan, A.K.: Vocal indicators of emotional stress. Int. J. Comput. Appl. 122(15), 38–43 (2015)
Fernandez, R., Rosalind, W.P.: Modeling drivers speech under stress. Speech Commun. 40(1), 145–159 (2003)
Womak, B.D., Hansen, J.H.: Improved speech recognition via speaker stress directed classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, vol. 1, pp. 53–56 (1996)
Eriksson, A., Traunmüller, H.: Perception of vocal effort and distance from the speaker on the basis of vowel utterances. Percept. Psychophysics 64(1), 131–139 (2002)
Tartter, V.C., Gomes, H., Litwin, E.: Some acoustic effects of listening to noise on speech production. J. Acoust. Soc. Am. 94(4), 2437–2440 (1993)
Sigmund, M.: Introducing the database ExamStress for speech under stress. In: Proceedings of the 7th Nordic Signal Processing Symposium-NORSIG, pp. 290–293. IEEE (2006)
Camacho, A.: SWIPE: A sawtooth waveform inspired pitch estimator for speech and music. Doctoral dissertation, University of Florida (2007)
Protopapas, A., Lieberman, P.: Fundamental frequency of phonation and perceived emotional stress. J. Acoust. Soc. Am. 101(4), 2267–2277 (1997)
Röbel, A., Rodet, X.: Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation. In: International Conference on Digital Audio Effects, pp. 30–35 (2005)
Hansen, J.H.L., Patil, S.: Speech under stress: analysis, modeling and recognition. In: Müller, C. (ed.) Speaker Classification I. LNCS (LNAI), vol. 4343, pp. 108–137. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74200-5_6
Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11(2–3), 109–118 (1992)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2008)
Tsamardinos, I., Rakhshani, A., Lagani, V.: Performance-estimation properties of cross-validation-based protocols with simultaneous hyper-parameter optimization. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS (LNAI), vol. 8445, pp. 1–14. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07064-3_1
Pampouchidou, A., Simantiraki, O., Fazlollahi, A., Pediaditis, M., Manousos, D., Roniotis, A., Giannakakis, G., Meriaudeau, F., Simos, P., Marias, K., Yang, F., Tsiknakis, M.: Depression assessment by fusing high and low level features from audio, video, and text. In: The 6th Audio/Visual Emotion Challenge and Workshop. ACM-Multimedia (2016)
Acknowledgments
The authors acknowledge support from the iManageCancer EU project under contract H2020-PHC-26-2014 No.643529.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Simantiraki, O., Giannakakis, G., Pampouchidou, A., Tsiknakis, M. (2018). Stress Detection from Speech Using Spectral Slope Measurements. In: Oliver, N., Serino, S., Matic, A., Cipresso, P., Filipovic, N., Gavrilovska, L. (eds) Pervasive Computing Paradigms for Mental Health. FABULOUS MindCare IIOT 2016 2016 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 207. Springer, Cham. https://doi.org/10.1007/978-3-319-74935-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-74935-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74934-1
Online ISBN: 978-3-319-74935-8
eBook Packages: Computer ScienceComputer Science (R0)