Standard Baseline Feature Sets

Eyben, Florian

doi:10.1007/978-3-319-27299-3_3

Florian Eyben²

Part of the book series: Springer Theses ((Springer Theses))

2048 Accesses
1 Citations

Abstract

A central aim of this thesis was to define standard acoustic feature sets for both speech and music, which contain a large and comprehensive set of acoustic descriptors. Based on previous efforts to combine features and the authors experience from evaluations across several databases and tasks, 12 standard acoustic parameter sets have been proposed and well evaluated for this thesis. These sets include the acoustic baseline features sets of the INTERSPEECH challenges on Emotion and Paralinguistics form 2009–2013 (ComParE) as well as the Audio-Visual Emotion Challenges (2011–2013). Further, two sets for music processing and two minimalistic speech parameter sets (GeMAPS and eGeMAPS) are proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
An article is to appear in IEEE Transactions on Affective Computing (Eyben et al. 2015).
2.
According to Schuller et al. (2011a)—and the openSMILE configuration file—the IS11 set contains 4,368 features in total. This is also the size of the baseline feature vectors provided for the challenge. However, the duration of the segment is counted twice there, due to the way it was implemented in the openSMILE configuration file. Thus, the correct number of unique features in IS11 is 4,367.
3.
The RTF for the rhythmic features was not evaluated, as they are not implemented in the openSMILE framework.

References

R. Banse, K.R. Scherer, Acoustic profiles in vocal emotion expression. J. Personal. Soc. Psychol. 70(3), 614–636 (1996)
Article Google Scholar
A. Batliner, J. Buckow, R. Huber, V. Warnke, E. Nöth, H. Niemann, Prosodic Feature Evaluation: Brute Force or Well Designed? In Proceedings of the 14th ICPhS, vol 3, San Francisco, CA, USA, pp. 2315–2318 (1999)
Google Scholar
A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, V. Aharonson, N. Amir, Whodunnit—Searching for the most important feature types signalling emotional user states in speech. Comput. Speech Lang. 25(1), 4–28 (2011)
Article Google Scholar
A. Batliner, B. Möbius, Prosodic models, automatic speech understanding, and speech synthesis: towards the common ground?, in The Integration of Phonetic Knowledge in Speech Technology, ed. by W. Barry, W. Dommelen (Springer, Dordrecht, 2005), pp. 21–44
Chapter Google Scholar
F. Eyben, K. Scherer, B. Schuller, J. Sundberg, E. André, C. Busso, L. Devillers, J. Epps, P. Laukka, S. Narayanan, K. Truong, The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Trans. Affect. Comput. doi:10.1109/TAFFC.2015.2457417
Google Scholar
F. Eyben, B. Schuller, Music Classification with the Munich openSMILE Toolkit. In Proceedings of the Annual Meeting of the MIREX 2010 community as part of the 11th International Conference on Music Information Retrieval (ISMIR), Utrecht, The Netherlands, August 2010. ISMIR. http://www.music-ir.org/mirex/abstracts/2010/FE1.pdf
P.N. Juslin, P. Laukka, Communication of emotions in vocal expression and music performance: Different channels, same code? Psychol. Bull. 129(5), 770–814 (2003)
Article Google Scholar
E. Marchi, A. Batliner, B. Schuller, S. Fridenzon, S. Tal, O. Golan, Speech, Emotion, Age, Language, Task, and Typicality: Trying to Disentangle Performance and Feature Relevance. In Proceedings of the First International Workshop on Wide Spectrum Social Signal Processing (WS \(^3\) P 2012), held in conjunction with the ASE/IEEE International Conference on Social Computing (SocialCom 2012), IEEE Computer Society. pp. 961–968, Amsterdam, The Netherlands, September 2012
Google Scholar
M. Müller, F. Kurth, M. Clausen. Audio matching via chroma-based statistical features. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), pp. 288–295, London, UK, (2005)
Google Scholar
S. Patel, K.R. Scherer, Vocal behaviour, in Handbook of Nonverbal Communication, ed. by J.A. Hall, M.L. Knapp (Mouton-DeGruyter, Berlin, 2013), pp. 167–204
Google Scholar
A. Sadeghi Naini, M. Homayounpour, Speaker age interval and sex identification based on jitters, shimmers and mean MFCC using supervised and unsupervised discriminative classification methods. In Proceedings of the 8th International Conference on Signal Processing (ICSP), vol 1, Beijing, China, 2006. doi:10.1109/ICOSP.2006.345516
K.R. Scherer, Vocal affect expression: A review and a model for future research. Psychol. Bull. 99, 143–165 (1986)
Article Google Scholar
M. Schröder, Speech and Emotion Research: An Overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis, volume PHONUS 7 of Research Report of the Institute of Phonetics, Saarland University. Ph.D thesis, Institute for Phonetics, University of Saarbrücken, 2004
Google Scholar
M. Schröder, F. Burkhardt, S. Krstulovic, Synthesis of emotional speech, in Blueprint for Affective Computing, ed. by K.R. Scherer, T. Bänziger, E. Roesch (Oxford University Press, Oxford, 2010), pp. 222–231
Google Scholar
B. Schuller, A. Batliner, D. Seppi, S. Steidl, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, N. Amir, L. Kessous, V. Aharonson, The Relevance of Feature Type for the Automatic Classification of Emotional User States: Low Level Descriptors and Functionals. In Proceedings of INTERSPEECH 2007, ISCA .pp. 2253–2256, Antwerp, Belgium, August 2007a
Google Scholar
B. Schuller, F. Eyben, G. Rigoll, Fast and Robust Meter and Tempo Recognition for the Automatic Discrimination of Ballroom Dance Styles. In Proceedings of the ICASSP 2007, IEEE. vol I, pp 217–220, Honolulu, HI, USA, April 2007b
Google Scholar
B. Schuller, A. Batliner, S. Steidl, F. Schiel, J. Krajewski, The INTERSPEECH 2011 Speaker State Challenge. In Proceedings of INTERSPEECH 2011, ISCA. Florence, Italy, pp. 3201–3204 August 2011a
Google Scholar
B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, M. Pantic, AVEC 2011—The First International Audio/Visual Emotion Challenge, in Proceedings of the First International Audio/Visual Emotion Challenge and Workshop, AVEC 2011, held in conjunction with the International HUMAINE Association Conference on Affective Computing and Intelligent Interaction (ACII) 2011, vol. II, ed. by B. Schuller, M. Valstar, R. Cowie, M. Pantic (Springer, Memphis, TN, USA, October 2011b), pp. 415–424
Google Scholar
B. Schuller, G. Rigoll, Recognising Interest in Conversational Speech—Comparing Bag of Frames and Supra-segmental Features. In Proceedings of INTERSPEECH 2009, ISCA pp. 1999–2002, Brighton, UK, September 2009
Google Scholar
B. Schuller, G. Rigoll, M. Lang, Hidden Markov Model-based Speech Emotion Recognition. In Proceedings of the ICASSP 2003, IEEE. vol 2, pp. II 1–4, Hong Kong, China, April 2003
Google Scholar
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The INTERSPEECH 2010 Paralinguistic Challenge. In Proceedings of INTERSPEECH 2010, ISCA. Makuhari, Japan, pp. 2794–2797 September 2010
Google Scholar
B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, Y. Zhang, The INTERSPEECH 2014 computational paralinguistics challenge: Cognitive and physical load. In Proceedings of the INTERSPEECH 2014, ISCA. Singapore, 2014. (to appear)
Google Scholar
B. Schuller, S. Steidl, A. Batliner, F. Jurcicek, The INTERSPEECH 2009 Emotion Challenge. In Proceedings of INTERSPEECH 2009, Brighton, UK, pp. 312–315 September 2009
Google Scholar
B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss, The INTERSPEECH 2012 Speaker Trait Challenge. In Proceedings of INTERSPEECH 2012, ISCA. Portland, OR, USA, September 2012a
Google Scholar
B. Schuller, M. Valstar, R. Cowie, M. Pantic, AVEC 2012: the continuous audio/visual emotion challenge—an introduction, in Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI) 2012, ed. by L.-P. Morency, D. Bohus, H.K. Aghajan, J. Cassell, A. Nijholt, J. Epps (ACM, Santa Monica, CA, USA, October 2012b), pp. 361–362
Google Scholar
B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, et al., The INTERSPEECH 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism. In Proceedings of the INTERSPEECH 2013, ISCA. Lyon, France, pp. 148–152 2013
Google Scholar
B. Schuller, A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing (Wiley, Hoboken, 2013), p. 344. ISBN 978-1119971368
Book Google Scholar
J. Sundberg, S. Patel, E. Bjorkner, K .R. Scherer, Interdependencies among voice source parameters in emotional speech. IEEE Trans. Affect. Comput. 2(3), 162–174 (2011). doi:10.1109/T-AFFC.2011.14. ISSN 1949-3045
Article Google Scholar
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic, AVEC 2013—The Continuous Audio/Visual Emotion and Depression Recognition Challenge. In Proceedings of the ACM Multimedia 2013, CM. Barcelona, Spain, October 2013
Google Scholar
D. Ververidis, C. Kotropoulos, Emotional speech recognition: Resources, features, and methods. Speech Commun. 9, 1162–1181 (2006)
Article Google Scholar
F. Weninger, F. Eyben, B. W. Schuller, M. Mortillaro, K. R. Scherer, On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common. Frontiers in Psychology, 4(Article ID 292): 1–12, May 2013b. doi:10.3389/fpsyg.2013.00292

Download references

Author information

Authors and Affiliations

Institute for Human-Machine Communication (MMK), Technische Universität München, Munich, Germany
Florian Eyben

Authors

Florian Eyben
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Eyben .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Eyben, F. (2016). Standard Baseline Feature Sets. In: Real-time Speech and Music Classification by Large Audio Feature Space Extraction. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-319-27299-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-27299-3_3
Published: 25 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27298-6
Online ISBN: 978-3-319-27299-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics