Abstract
Stress detection from speech is a less explored field than Automatic Emotion Recognition and it is still not clear which features are better stress discriminants. The project VOCE aims at doing speech classification as stressed or not-stressed in real-time, using acoustic-prosodic features only. We therefore look for the best discriminating feature subsets from a set of 6125 features extracted with openSMILE toolkit plus 160 Teager Energy Operator (TEO) features. We use a Mutual Information (MI) filter and a branch and bound wrapper heuristic with an SVM classifier to perform feature selection. Since many feature sets are selected, we analyse them in terms of chosen features and classifier performance concerning also true positive and false positive rates. The results show that the best feature types for our application case are Audio Spectral, MFCC, PCM and TEO. We reached results as high as 70.4 % for generalisation accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Please refer to [3] for details on the collection methodology.
- 3.
Remaining utterances after discarding 94 utterances with length of less than 1 s or more than 25 s.
- 4.
Please note that the stated number of utterances on the train set corresponds to the one actually used after discarding a part of the neutral utterances, and not to the number of utterances in the natural set.
- 5.
The generic designation “type” is the result of aggregating Low Level Descriptor features with their derived functionals (e.g., quartiles, percentiles, means, maxima, minima). This procedure is, in our perspective, a way to better group and interpret the performance of the features.
- 6.
This value was found empirically to produce the best classification results.
- 7.
Generalisation Accuracy, Sensitivity and Specificity are computed on the test set.
- 8.
Being TP - number of True Positives, TN - number of True Negatives, FP - number of False Positives, FN - number of False Negatives, Sensitivity=\(\frac{\mathrm {TP}}{\mathrm {TP+FN}}\) and Specificity=\(\frac{\mathrm {TN}}{\mathrm {TN+FP}}\).
References
Abad, A., Astudillo, R.F., Trancoso, I.: The L2F spoken web search system for mediaeval 2013. In: Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, 18–19 October 2013 (2013)
Aguiar, A., Kaiseler, M., Meinedo, H., Almeida, P., Cunha, M., Silva, J.: VOCE corpus: ecologically collected speech annotated with physiological and psychological stress assessments. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik (2014)
Aguiar, A.C., Kaiseler, M., Meinedo, H., Abrudan, T.E., Almeida, P.R.: Speech stress assessment using physiological and psychological measures. In: Mattern, F., Santini, S., Canny, J.F., Langheinrich, M., Rekimoto, J. (eds.) UbiComp (Adjunct Publication), pp. 921–930. ACM (2013)
Allen, M.T., Boquet, A.J., Shelley, K.S.: Cluster analyses of cardiovascular responsivity to three laboratory stressors. Psychosom. Med. 53(3), 272–288 (1991)
Batista, F., Moniz, H., Trancoso, I., Mamede, N.J.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)
Cairns, D.A., Hansen, J.H.L., Kaiser, J.F.: Recent advances in hypernasal speech detection using the nonlinear teager energy operator. In: ICSLP 1996, p. 1 (1996)
Demenko, G.: Voice stress extraction. In: Proceedings of the Speech Prosody 2008 Conference (2008)
Demenko, G., Jastrzebska, M.: Analysis of voice stress in call centers conversations. In: Proceedings of Speech Prosody, 6th International Conference, Shanghai, China (2012)
Eyben, F., Wllmer, M., Schuller, B.: openSMILE: the munich versatile and fast open-source audio feature extractor. In: Bimbo, A.D., Chang, S.F., Smeulders, A.W.M. (eds.) ACM Multimedia, pp. 1459–1462. ACM (2010)
Fernandez, R., Picard, R.W.: Modeling drivers’ speech under stress. Speech Commun. 40(1–2), 145–159 (2003)
Ferreira, J., Meinedo, H.: VOCE project stress feature survey technical report 2. Technical report, L2F, Inesc-ID, Lisboa, Portugal, November 2013
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with the susas: A speech under simulated and actual stress database. Technical Report: RSPL-98-10 (1998)
Hansen, J.H., Patil, S.A.: Speech under stress: Analysis, modeling and recognition (2007)
Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Sig. Process. Lett. 6, 259–261 (1999)
Kumar, V., Minz, S.: Feature selection: a literature review. Smart CR 4(3), 211–229 (2014)
Lu, H., Frauendorfer, D., Rabbi, M., Mast, M.S., Chittaranjan, G.T., Campbell, A.T., Gatica-Perez, D., Choudhury, T.: Stresssense: detecting stress in unconstrained acoustic environments using smartphones. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp 2012, pp. 351–360. ACM, New York (2012). http://doi.acm.org/10.1145/2370216.2370270
Miller, T.C., Stone, D.N.: Public speaking apprehension (psa), motivation, and affect among accounting majors: a proofofconcept intervention. Issues Account. Educ. 24(3), 265–298 (2009)
Sundaram, N., Smolenski, B., Yantorno, R.: Instantaneous nonlinear teager energy operator for robust voicedunvoiced speech classification (2003)
Pearson, R.K. (ed.): Exploring Data in Engineering, the Sciences, and Medicine. Oxford University Press, USA (2011)
Sarikaya, R., Gowdy, J.N.: Subband based classification of speech under stress. In: ICASSP, pp. 569–572 (1998)
Scherer, K.R., Grandjean, D., Johnstone, T., Klasmeyer, G., Bnziger, T.: Acoustic correlates of task load and stress. In: Hansen, J.H.L., Pellom, B.L. (eds.) INTERSPEECH. ISCA (2002)
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., MüLler, C., Narayanan, S.: Paralinguistics in speech and language-state-of-the-art and the challenge. Comput. Speech Lang. 27(1), 4–39 (2013)
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: INTERSPEECH, pp. 2253–2256. ISCA (2007)
Schuller, B., Steidl, S., Batliner, A., Nöth, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The interspeech 2012 speaker trait challenge. In: INTERSPEECH. ISCA (2012)
Sun, X.: A pitch determination algorithm based on subharmonic-to-harmonic ratio. In: the 6th International Conference of Spoken Language Processing, pp. 676–679 (2000)
Sun, Z., Li, Z.: Data intensive parallel feature selection method study. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 2256–2262, July 2014
Vogt, T., André, E., Wagner, J.: Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868, pp. 75–91. Springer, Heidelberg (2008)
Wells, J.: Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Zhou, G., Hansen, J., Kaiser, J.: Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9, 201–216 (2001)
Zuo, X., Fung, P.N.: A cross gender and cross lingual study on acoustic features for stress recognition in speech. In: Proceedings 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong, pp. 2336–2339 (2011)
Acknowledgments
This work was supported by national funds through Fundação para a Ciência e Tecnologia (FCT) by project VOCE (Voice Coach for Reduced Stress) PTDC/EEA-ELC/121018/2010, UID/CEC/50021/2013, and Post-doc grant SFRH/PBD/95849/2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Julião, M., Silva, J., Aguiar, A., Moniz, H., Batista, F. (2015). Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search. In: Sierra-Rodríguez, JL., Leal, JP., Simões, A. (eds) Languages, Applications and Technologies. SLATE 2015. Communications in Computer and Information Science, vol 563. Springer, Cham. https://doi.org/10.1007/978-3-319-27653-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-27653-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27652-6
Online ISBN: 978-3-319-27653-3
eBook Packages: Computer ScienceComputer Science (R0)