Skip to main content

Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search

  • Conference paper
  • First Online:
Languages, Applications and Technologies (SLATE 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 563))

Included in the following conference series:

Abstract

Stress detection from speech is a less explored field than Automatic Emotion Recognition and it is still not clear which features are better stress discriminants. The project VOCE aims at doing speech classification as stressed or not-stressed in real-time, using acoustic-prosodic features only. We therefore look for the best discriminating feature subsets from a set of 6125 features extracted with openSMILE toolkit plus 160 Teager Energy Operator (TEO) features. We use a Mutual Information (MI) filter and a branch and bound wrapper heuristic with an SVM classifier to perform feature selection. Since many feature sets are selected, we analyse them in terms of chosen features and classifier performance concerning also true positive and false positive rates. The results show that the best feature types for our application case are Audio Spectral, MFCC, PCM and TEO. We reached results as high as 70.4 % for generalisation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://paginas.fe.up.pt/~voce.

  2. 2.

    Please refer to [3] for details on the collection methodology.

  3. 3.

    Remaining utterances after discarding 94 utterances with length of less than 1 s or more than 25 s.

  4. 4.

    Please note that the stated number of utterances on the train set corresponds to the one actually used after discarding a part of the neutral utterances, and not to the number of utterances in the natural set.

  5. 5.

    The generic designation “type” is the result of aggregating Low Level Descriptor features with their derived functionals (e.g., quartiles, percentiles, means, maxima, minima). This procedure is, in our perspective, a way to better group and interpret the performance of the features.

  6. 6.

    This value was found empirically to produce the best classification results.

  7. 7.

    Generalisation Accuracy, Sensitivity and Specificity are computed on the test set.

  8. 8.

    Being TP - number of True Positives, TN - number of True Negatives, FP - number of False Positives, FN - number of False Negatives, Sensitivity=\(\frac{\mathrm {TP}}{\mathrm {TP+FN}}\) and Specificity=\(\frac{\mathrm {TN}}{\mathrm {TN+FP}}\).

References

  1. Abad, A., Astudillo, R.F., Trancoso, I.: The L2F spoken web search system for mediaeval 2013. In: Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, 18–19 October 2013 (2013)

    Google Scholar 

  2. Aguiar, A., Kaiseler, M., Meinedo, H., Almeida, P., Cunha, M., Silva, J.: VOCE corpus: ecologically collected speech annotated with physiological and psychological stress assessments. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik (2014)

    Google Scholar 

  3. Aguiar, A.C., Kaiseler, M., Meinedo, H., Abrudan, T.E., Almeida, P.R.: Speech stress assessment using physiological and psychological measures. In: Mattern, F., Santini, S., Canny, J.F., Langheinrich, M., Rekimoto, J. (eds.) UbiComp (Adjunct Publication), pp. 921–930. ACM (2013)

    Google Scholar 

  4. Allen, M.T., Boquet, A.J., Shelley, K.S.: Cluster analyses of cardiovascular responsivity to three laboratory stressors. Psychosom. Med. 53(3), 272–288 (1991)

    Article  Google Scholar 

  5. Batista, F., Moniz, H., Trancoso, I., Mamede, N.J.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)

    Article  Google Scholar 

  6. Cairns, D.A., Hansen, J.H.L., Kaiser, J.F.: Recent advances in hypernasal speech detection using the nonlinear teager energy operator. In: ICSLP 1996, p. 1 (1996)

    Google Scholar 

  7. Demenko, G.: Voice stress extraction. In: Proceedings of the Speech Prosody 2008 Conference (2008)

    Google Scholar 

  8. Demenko, G., Jastrzebska, M.: Analysis of voice stress in call centers conversations. In: Proceedings of Speech Prosody, 6th International Conference, Shanghai, China (2012)

    Google Scholar 

  9. Eyben, F., Wllmer, M., Schuller, B.: openSMILE: the munich versatile and fast open-source audio feature extractor. In: Bimbo, A.D., Chang, S.F., Smeulders, A.W.M. (eds.) ACM Multimedia, pp. 1459–1462. ACM (2010)

    Google Scholar 

  10. Fernandez, R., Picard, R.W.: Modeling drivers’ speech under stress. Speech Commun. 40(1–2), 145–159 (2003)

    Article  MATH  Google Scholar 

  11. Ferreira, J., Meinedo, H.: VOCE project stress feature survey technical report 2. Technical report, L2F, Inesc-ID, Lisboa, Portugal, November 2013

    Google Scholar 

  12. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  13. Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with the susas: A speech under simulated and actual stress database. Technical Report: RSPL-98-10 (1998)

    Google Scholar 

  14. Hansen, J.H., Patil, S.A.: Speech under stress: Analysis, modeling and recognition (2007)

    Google Scholar 

  15. Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Sig. Process. Lett. 6, 259–261 (1999)

    Article  Google Scholar 

  16. Kumar, V., Minz, S.: Feature selection: a literature review. Smart CR 4(3), 211–229 (2014)

    Google Scholar 

  17. Lu, H., Frauendorfer, D., Rabbi, M., Mast, M.S., Chittaranjan, G.T., Campbell, A.T., Gatica-Perez, D., Choudhury, T.: Stresssense: detecting stress in unconstrained acoustic environments using smartphones. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp 2012, pp. 351–360. ACM, New York (2012). http://doi.acm.org/10.1145/2370216.2370270

  18. Miller, T.C., Stone, D.N.: Public speaking apprehension (psa), motivation, and affect among accounting majors: a proofofconcept intervention. Issues Account. Educ. 24(3), 265–298 (2009)

    Article  Google Scholar 

  19. Sundaram, N., Smolenski, B., Yantorno, R.: Instantaneous nonlinear teager energy operator for robust voicedunvoiced speech classification (2003)

    Google Scholar 

  20. Pearson, R.K. (ed.): Exploring Data in Engineering, the Sciences, and Medicine. Oxford University Press, USA (2011)

    Google Scholar 

  21. Sarikaya, R., Gowdy, J.N.: Subband based classification of speech under stress. In: ICASSP, pp. 569–572 (1998)

    Google Scholar 

  22. Scherer, K.R., Grandjean, D., Johnstone, T., Klasmeyer, G., Bnziger, T.: Acoustic correlates of task load and stress. In: Hansen, J.H.L., Pellom, B.L. (eds.) INTERSPEECH. ISCA (2002)

    Google Scholar 

  23. Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., MüLler, C., Narayanan, S.: Paralinguistics in speech and language-state-of-the-art and the challenge. Comput. Speech Lang. 27(1), 4–39 (2013)

    Article  Google Scholar 

  24. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: INTERSPEECH, pp. 2253–2256. ISCA (2007)

    Google Scholar 

  25. Schuller, B., Steidl, S., Batliner, A., Nöth, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., Bocklet, T., Mohammadi, G., Weiss, B.: The interspeech 2012 speaker trait challenge. In: INTERSPEECH. ISCA (2012)

    Google Scholar 

  26. Sun, X.: A pitch determination algorithm based on subharmonic-to-harmonic ratio. In: the 6th International Conference of Spoken Language Processing, pp. 676–679 (2000)

    Google Scholar 

  27. Sun, Z., Li, Z.: Data intensive parallel feature selection method study. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 2256–2262, July 2014

    Google Scholar 

  28. Vogt, T., André, E., Wagner, J.: Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868, pp. 75–91. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  29. Wells, J.: Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin (1997)

    Google Scholar 

  30. Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)

    Article  Google Scholar 

  31. Zhou, G., Hansen, J., Kaiser, J.: Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9, 201–216 (2001)

    Article  Google Scholar 

  32. Zuo, X., Fung, P.N.: A cross gender and cross lingual study on acoustic features for stress recognition in speech. In: Proceedings 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong, pp. 2336–2339 (2011)

    Google Scholar 

Download references

Acknowledgments

This work was supported by national funds through Fundação para a Ciência e Tecnologia (FCT) by project VOCE (Voice Coach for Reduced Stress) PTDC/EEA-ELC/121018/2010, UID/CEC/50021/2013, and Post-doc grant SFRH/PBD/95849/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariana Julião .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Julião, M., Silva, J., Aguiar, A., Moniz, H., Batista, F. (2015). Speech Features for Discriminating Stress Using Branch and Bound Wrapper Search. In: Sierra-Rodríguez, JL., Leal, JP., Simões, A. (eds) Languages, Applications and Technologies. SLATE 2015. Communications in Computer and Information Science, vol 563. Springer, Cham. https://doi.org/10.1007/978-3-319-27653-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27653-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27652-6

  • Online ISBN: 978-3-319-27653-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics