Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search

Julião, Mariana; Silva, Jorge; Aguiar, Ana; Moniz, Helena; Batista, Fernando

doi:10.1007/978-3-319-27653-3_1

Mariana Julião¹³,
Jorge Silva¹³,
Ana Aguiar¹³,
Helena Moniz^14,15 &
…
Fernando Batista^14,16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 563))

Included in the following conference series:

International Symposium on Languages, Applications and Technologies

412 Accesses
2 Citations

Abstract

Stress detection from speech is a less explored field than Automatic Emotion Recognition and it is still not clear which features are better stress discriminants. The project VOCE aims at doing speech classification as stressed or not-stressed in real-time, using acoustic-prosodic features only. We therefore look for the best discriminating feature subsets from a set of 6125 features extracted with openSMILE toolkit plus 160 Teager Energy Operator (TEO) features. We use a Mutual Information (MI) filter and a branch and bound wrapper heuristic with an SVM classifier to perform feature selection. Since many feature sets are selected, we analyse them in terms of chosen features and classifier performance concerning also true positive and false positive rates. The results show that the best feature types for our application case are Audio Spectral, MFCC, PCM and TEO. We reached results as high as 70.4 % for generalisation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://paginas.fe.up.pt/~voce.
2.
Please refer to [3] for details on the collection methodology.
3.
Remaining utterances after discarding 94 utterances with length of less than 1 s or more than 25 s.
4.
Please note that the stated number of utterances on the train set corresponds to the one actually used after discarding a part of the neutral utterances, and not to the number of utterances in the natural set.
5.
The generic designation “type” is the result of aggregating Low Level Descriptor features with their derived functionals (e.g., quartiles, percentiles, means, maxima, minima). This procedure is, in our perspective, a way to better group and interpret the performance of the features.
6.
This value was found empirically to produce the best classification results.
7.
Generalisation Accuracy, Sensitivity and Specificity are computed on the test set.
8.
Being TP - number of True Positives, TN - number of True Negatives, FP - number of False Positives, FN - number of False Negatives, Sensitivity=\(\frac{\mathrm {TP}}{\mathrm {TP+FN}}\) and Specificity=\(\frac{\mathrm {TN}}{\mathrm {TN+FP}}\).

References

Abad, A., Astudillo, R.F., Trancoso, I.: The L2F spoken web search system for mediaeval 2013. In: Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, 18–19 October 2013 (2013)
Google Scholar
Aguiar, A., Kaiseler, M., Meinedo, H., Almeida, P., Cunha, M., Silva, J.: VOCE corpus: ecologically collected speech annotated with physiological and psychological stress assessments. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik (2014)
Google Scholar
Aguiar, A.C., Kaiseler, M., Meinedo, H., Abrudan, T.E., Almeida, P.R.: Speech stress assessment using physiological and psychological measures. In: Mattern, F., Santini, S., Canny, J.F., Langheinrich, M., Rekimoto, J. (eds.) UbiComp (Adjunct Publication), pp. 921–930. ACM (2013)
Google Scholar
Allen, M.T., Boquet, A.J., Shelley, K.S.: Cluster analyses of cardiovascular responsivity to three laboratory stressors. Psychosom. Med. 53(3), 272–288 (1991)
Article Google Scholar
Batista, F., Moniz, H., Trancoso, I., Mamede, N.J.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)
Article Google Scholar
Cairns, D.A., Hansen, J.H.L., Kaiser, J.F.: Recent advances in hypernasal speech detection using the nonlinear teager energy operator. In: ICSLP 1996, p. 1 (1996)
Google Scholar
Demenko, G.: Voice stress extraction. In: Proceedings of the Speech Prosody 2008 Conference (2008)
Google Scholar
Demenko, G., Jastrzebska, M.: Analysis of voice stress in call centers conversations. In: Proceedings of Speech Prosody, 6th International Conference, Shanghai, China (2012)
Google Scholar
Eyben, F., Wllmer, M., Schuller, B.: openSMILE: the munich versatile and fast open-source audio feature extractor. In: Bimbo, A.D., Chang, S.F., Smeulders, A.W.M. (eds.) ACM Multimedia, pp. 1459–1462. ACM (2010)
Google Scholar
Fernandez, R., Picard, R.W.: Modeling drivers’ speech under stress. Speech Commun. 40(1–2), 145–159 (2003)
Article MATH Google Scholar
Ferreira, J., Meinedo, H.: VOCE project stress feature survey technical report 2. Technical report, L2F, Inesc-ID, Lisboa, Portugal, November 2013
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with the susas: A speech under simulated and actual stress database. Technical Report: RSPL-98-10 (1998)
Google Scholar
Hansen, J.H., Patil, S.A.: Speech under stress: Analysis, modeling and recognition (2007)
Google Scholar
Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Sig. Process. Lett. 6, 259–261 (1999)
Article Google Scholar
Kumar, V., Minz, S.: Feature selection: a literature review. Smart CR 4(3), 211–229 (2014)
Google Scholar
Lu, H., Frauendorfer, D., Rabbi, M., Mast, M.S., Chittaranjan, G.T., Campbell, A.T., Gatica-Perez, D., Choudhury, T.: Stresssense: detecting stress in unconstrained acoustic environments using smartphones. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp 2012, pp. 351–360. ACM, New York (2012). http://doi.acm.org/10.1145/2370216.2370270
Miller, T.C., Stone, D.N.: Public speaking apprehension (psa), motivation, and affect among accounting majors: a proofofconcept intervention. Issues Account. Educ. 24(3), 265–298 (2009)
Article Google Scholar
Sundaram, N., Smolenski, B., Yantorno, R.: Instantaneous nonlinear teager energy operator for robust voicedunvoiced speech classification (2003)
Google Scholar
Pearson, R.K. (ed.): Exploring Data in Engineering, the Sciences, and Medicine. Oxford University Press, USA (2011)
Google Scholar
Sarikaya, R., Gowdy, J.N.: Subband based classification of speech under stress. In: ICASSP, pp. 569–572 (1998)
Google Scholar
Scherer, K.R., Grandjean, D., Johnstone, T., Klasmeyer, G., Bnziger, T.: Acoustic correlates of task load and stress. In: Hansen, J.H.L., Pellom, B.L. (eds.) INTERSPEECH. ISCA (2002)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., MüLler, C., Narayanan, S.: Paralinguistics in speech and language-state-of-the-art and the challenge. Comput. Speech Lang. 27(1), 4–39 (2013)
Article Google Scholar
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: INTERSPEECH, pp. 2253–2256. ISCA (2007)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Nöth, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The interspeech 2012 speaker trait challenge. In: INTERSPEECH. ISCA (2012)
Google Scholar
Sun, X.: A pitch determination algorithm based on subharmonic-to-harmonic ratio. In: the 6th International Conference of Spoken Language Processing, pp. 676–679 (2000)
Google Scholar
Sun, Z., Li, Z.: Data intensive parallel feature selection method study. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 2256–2262, July 2014
Google Scholar
Vogt, T., André, E., Wagner, J.: Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868, pp. 75–91. Springer, Heidelberg (2008)
Chapter Google Scholar
Wells, J.: Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)
Google Scholar
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Article Google Scholar
Zhou, G., Hansen, J., Kaiser, J.: Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9, 201–216 (2001)
Article Google Scholar
Zuo, X., Fung, P.N.: A cross gender and cross lingual study on acoustic features for stress recognition in speech. In: Proceedings 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong, pp. 2336–2339 (2011)
Google Scholar

Download references

Acknowledgments

This work was supported by national funds through Fundação para a Ciência e Tecnologia (FCT) by project VOCE (Voice Coach for Reduced Stress) PTDC/EEA-ELC/121018/2010, UID/CEC/50021/2013, and Post-doc grant SFRH/PBD/95849/2013.

Author information

Authors and Affiliations

Instituto de Telecomunicações, Rua Dr. Roberto Frias, s/n, 4200-465, Porto, Portugal
Mariana Julião, Jorge Silva & Ana Aguiar
INESC-ID, Lisboa, Portugal
Helena Moniz & Fernando Batista
FLUL/CLUL, Universidade de Lisboa, Lisboa, Portugal
Helena Moniz
ISCTE - Instituto Universitário de Lisboa, Lisboa, Portugal
Fernando Batista

Authors

Mariana Julião
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Silva
View author publications
You can also search for this author in PubMed Google Scholar
Ana Aguiar
View author publications
You can also search for this author in PubMed Google Scholar
Helena Moniz
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Batista
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariana Julião .

Editor information

Editors and Affiliations

Complutense University of Madrid, Madrid, Spain
José-Luis Sierra-Rodríguez
Universidade do Porto - DCC, Porto, Portugal
José-Paulo Leal
Universidade do Minho, Braga, Portugal
Alberto Simões

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Julião, M., Silva, J., Aguiar, A., Moniz, H., Batista, F. (2015). Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search. In: Sierra-Rodríguez, JL., Leal, JP., Simões, A. (eds) Languages, Applications and Technologies. SLATE 2015. Communications in Computer and Information Science, vol 563. Springer, Cham. https://doi.org/10.1007/978-3-319-27653-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-27653-3_1
Published: 24 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27652-6
Online ISBN: 978-3-319-27653-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics