Skip to main content

Audio Features

  • Chapter
  • First Online:

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

To represent the information contained in an audio (stream) in a compact way focussing on a task of interest, a parameterised form is usually chosen. These parameters describe properties of the audio usually in a highly information reduced form and typically at a considerably lower rate, such as the mean energy or pitch over a longer period of time. As different Intelligent Audio Analysis tasks are often best represented by different such ’features’, a broad selection of the most typical ones is presented. This includes description of the digitalisation and segmentation of the audio as first step. Features include intensity, zero-crossings, auto correlation, spectrum and cepstrum, linear prediction, line spectral pairs, perceptual linear prediction, formants, fundamental frequency and voicing probability, and jitter and shimmer from the speech domain. Further, music, sound, and textual descriptors are included. Then, the principle of supra-segmental brute-forcing and subsequent reduction and selection are explained. As an example serves the widely used openSMILE feature extractor.

The ability to focus attention on important things is a defining characteristic of intelligence.

—Robert J. Shiller.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    ISO/IEC JTC 1/SC 29/WG 11 N7708.

  2. 2.

    \((VC)^m\) here means an \(m\)-fold repetition of the string \(VC\)

  3. 3.

    http://conceptnet.media.mit.edu/

  4. 4.

    http://commons.media.mit.edu/en/

  5. 5.

    openNLP notation is followed for POS classes.

  6. 6.

    Available at: http://opensmile.sourceforge.net/.

  7. 7.

    http://www.phon.ucl.ac.uk/resource/sfs/

  8. 8.

    http://cobweb.ecn.purdue.edu/malcolm/interval/1998-010/

  9. 9.

    http://affect.media.mit.edu/publications.php

  10. 10.

    http://www.speech.kth.se/snack/

  11. 11.

    http://libxtract.sourceforge.net/

  12. 12.

    http://marsyas.sness.net/

  13. 13.

    https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox

  14. 14.

    A more detailed description can be found in the openSMILE documentation available in the download package at http://sourceforge.net/projects/opensmile/.

  15. 15.

    openSMILE was awarded third place in the ACM Multimedia 2010 Open-Source Software Competition. It was further used as standard feature extractor for baseline computation and use by participants in six research challenges.

References

  1. Parsons, T.: Voice and Speech Processing. McGraw-Hill (1987)

    Google Scholar 

  2. Ruske, G.: Automatische Spracherkennung, 2nd edn. Methoden der Klassifikation und Merkmalsextraktion. Oldenbourg, Munich (1993)

    Google Scholar 

  3. Oppenheim, A.V., Willsky, A.S., Hamid, S.: Signals and Systems, 2nd edn. Prentice Hall, (1996)

    Google Scholar 

  4. Wendemuth, A.: Grundlagen der digitalen Signalverarbeitung: Ein Mathematischer Zugang. Springer, Berlin (2005)

    Google Scholar 

  5. Wendemuth, A.: Grundlagen der stochastischen Sprachverarbeitung. Oldenbourg, München, Wien (2004)

    Book  Google Scholar 

  6. Deller, J., Proakis, J., Hansen, J.: Discrete-Time Processing of Speech Signals. Macmillan Publishing Company, Yew York (1993)

    Google Scholar 

  7. O’Shaughnessy, D.: Speech Communication, 2nd edn. Adison-Wesley (1990)

    Google Scholar 

  8. Schuller, B., Rigoll, G.: Timing levels in segment-based speech emotion recognition. In: Proceedings of the 9th International Conference on Spoken Language Processing, INTERSPEECH 2006, ICSLP, ISCA, pp. 1818–1821, Pittsburgh, Sep 2006

    Google Scholar 

  9. Schuller, B., Wimmer, M., Mösenlechner, L., Kern, C., Arsić, D., Rigoll, G.: Brute-forcing hierarchical functionals for paralinguistics: a waste of feature space? In: Proceedings of the 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, (IEEE) pp. 4501–4504, Las Vegas, NV, April 2008

    Google Scholar 

  10. Sohn, J., Kim, N.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)

    Article  MathSciNet  Google Scholar 

  11. Ramirez, J., Segura, J., Benitez, M., De La Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42(3), 271–287 (2004)

    Article  Google Scholar 

  12. Ramirez, J., Segura, J., Benitez, C., Garcia, L., Rubio, A.: Statistical voice activity detection using a multiple observation likelihood ratio test. IEEE Signal Process. Lett. 12(10), 689–692 (2005)

    Article  Google Scholar 

  13. R. Gemello, F. Mana, and R. D. Mori. Non-linear esimation of voice activity to improve automatic recognition of noisy speech. In: Proceedings of INTERSPEECH, 2005, ISCA pp. 2617–2620, Lisbon, Sept 2005

    Google Scholar 

  14. Mousazadeh, S., Cohen, I.: AR-GARCH in presence of noise: parameter estimation and its application to voice activity detection. IEEE Trans. Audio Speech Lang. Process. 19(4), 916–926 (2011)

    Article  Google Scholar 

  15. Zwicker, E., Fastl, H.: Psychoacoustics—Facts and Models, 2nd edn. Springer, Berlin (1999)

    Google Scholar 

  16. Kießling, A.: Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. Berichte aus der Informatik. Shaker, Aachen (1997)

    Google Scholar 

  17. Furui, S.: Digital Speech Processing: Synthesis, and Recognition. Signal Processing and Communications, 2nd edn. Marcel Denker Inc, New York (1996)

    Google Scholar 

  18. Schuller, B.: Automatische Emotionserkennung aus sprachlicher und manueller Interaktion. Doctoral thesis, Technische Universität München, Munich, Germany, June (2006)

    Google Scholar 

  19. Fant, G.: Speech Sounds and Features. MIT Press, Cambridge (1973)

    Google Scholar 

  20. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (v3.4). Cambridge University Press, Cambridge, (2006)

    Google Scholar 

  21. Kabal, P., Ramachandran, R.P.: The Computation of Line Spectral Frequencies Using Chebyshev Polynomials. IEEE Trans. Acoust. Speech Signal Process. 34(6), 1419–1426 (December 1986)

    Article  Google Scholar 

  22. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)

    Article  Google Scholar 

  23. Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: RASTA-PLP speech analysis technique. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 121–124 (1992)

    Google Scholar 

  24. Rigoll, G.: A new algorithm for estimation of formant trajectories directly from the speech signal based on an extended Kalman-filter. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 11, pp. 1229–1232. Tokyo (1986)

    Google Scholar 

  25. Broad, D.J., Clermont, F.: Formant estimation by linear transformation of the LPC cepstrum. J. Acoust. Soc. Am. 86, 2013–2017 (1989)

    Article  Google Scholar 

  26. McCandless, S.: An algorithm for automatic formant extraction using linear prediction spectra. IEEE Trans. Acoust. 22, 134–141 (1974)

    Google Scholar 

  27. Gläser, C., Heckmann, M., Joublin, F., Goerick, C.: Combining auditory preprocessing and bayesian estimation for robust formant tracking. IEEE Trans. Audio Speech Lang. Process. 18(2), 224–236 (2010)

    Article  Google Scholar 

  28. Deng, L., Cui, X., Pruvenok, R., Huang, J., Momen, S., Chen, Y., Alwan A.: A database of vocal tract resonance trajectories for research in speech processing. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), p. 1. Toulouse May 2006.

    Google Scholar 

  29. Fulop, S.A.: Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction. J. Acoust. Soc. Am. 127, 2114–2117 (2010)

    Article  Google Scholar 

  30. Miyanaga, Y., Miki, N., Nagai, N.: Adaptive identification of a time-varying ARMA speech model. IEEE Trans. Acoust. 34, 423–433 (1986)

    Article  Google Scholar 

  31. Steiglitz, K.: On the simultaneous estimation of poles and zeros in speech analysis. IEEE Trans. Acoust. 25, 229–234 (1977)

    Article  Google Scholar 

  32. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The impact of f0 extraction errors on the classification of prominence and emotion. In: Proceedings 16th International Congress of Phonetic Sciences, ICPhS 2007, pp. 2201–2204. Saarbrücken, Aug 2007

    Google Scholar 

  33. Hess, W.: Pitch Determination of Speech Signals. Springer, Berlin (1983)

    Book  Google Scholar 

  34. Heckmann, M., Joublin, F., Nakadai, K.: Pitch extraction in human-robot interaction. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE/RSJ, Taipei (2010)

    Google Scholar 

  35. Hermes, D.J.: Measurement of pitch by subharmonic summation. J. Acoust. Soc. Am. 83(1), 257–264 (1988)

    Article  Google Scholar 

  36. Ahmadi, S., Spanias, A.S.: Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm. IEEE Trans. Speech Audio Process. 7(3), 333–338 (May 1999)

    Article  Google Scholar 

  37. Botros, N.: Speech-pitch detection using maximum likelihood algorithm. In: Proceedings of the First Joint BMES/EMBS Conference, vol. 2. (1999)

    Google Scholar 

  38. Markel, J.: The SIFT algorithm for fundamental frequency estimation. IEEE Trans. Audio Electroacoust. 20, 367–377 (1972)

    Article  Google Scholar 

  39. Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5, 341–345 (2001)

    Google Scholar 

  40. Ross, M., Shaffer, H., Cohen, A., Freudberg, R., Manley, H.: Average magnitude difference function pitch extractor. IEEE Trans. Acoust. Speech Signal Process. 22, 353–362 (1974)

    Article  Google Scholar 

  41. Orlikoff, R.-F., Baken, R.: The effect of the heartbeat on vocal fundamental frequency perturbation. J. Sport Health Res. 32(3), 576–582 (1989)

    Google Scholar 

  42. Haji, T., Horiguchi, S., Baer, T., Gould, W.: Frequency and amplitude perturbation analysis of electroglottograph during sustained phonation. J. Acoust. Soc. Am. 80(1), 58–62 (1986)

    Article  Google Scholar 

  43. Schuller, B.: Voice and speech analysis in search of states and traits. In: Salah, A.A., Gevers, T. (eds.) Computer Analysis of Human Behavior, Advances in Pattern Recognition, chapter 9, pp. 227–253. Springer, Heidelberg (2011)

    Google Scholar 

  44. Schuller, B., Gollan, B.: Music theoretic and perception-based features for audio key determination. J. New Music Res. 41(2), 175–193 (2012)

    Article  Google Scholar 

  45. Harte, C.A., Sandler, M.: Automatic chord identification using a quantised chromagram. In: Proceedings of the 118th Convention of the AES, May 2005

    Google Scholar 

  46. Schuller, B., Dorfner, J., Rigoll, G.: Determination of non-prototypical valence and arousal in popular music: Features and performances. EURASIP J. Audio Speech Music Process. (Special Issue Scalable Audio Content Anal.) 735854, 19 (2010)

    Google Scholar 

  47. Schuller, B., Hörnler, B., Arsić, D., Rigoll, G.: Audio chord labeling by musiological modeling and beat-synchronization. In: Proceedings of the 10th IEEE International Conference on Multimedia and Expo, ICME 2009, IEEE, pp. 526–529. New York, July 2009

    Google Scholar 

  48. Müller, M.: Information Retrieval for Music and Motion. Springer, Berlin (2007)

    Book  Google Scholar 

  49. Müller, M., Kurth, F., Clausen, M.: Chroma-based statistical audio features for audio matching. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 275–278, Oct 2005

    Google Scholar 

  50. Müller, M., Kurth, F.: Towards structural analysis of audio recordings in the presence of mucical variations. EURASIP J. Adv. Signal Process. 89686 (2007)

    Google Scholar 

  51. Schuller, B., Dibiasi, F., Eyben, F., Rigoll, G.: Music thumbnailing incorporating harmony- and rhythm structure. In: Detyniecki, M., Leiner, U., Nürnberger, A. (eds.) Adaptive Multimedia Retrieval: 6th International Workshop, AMR 2008, Berlin, Germany, 26–27 June 2008. Revised Selected Papers. Lecture Notes in Computer Science, vol. 5811, pp. 78–88. (LNCS) Springer, Berlin (2010)

    Google Scholar 

  52. Gomez, E.: Estimating the tonality of polyphonic audio files: cognitive versus machine learning modelling strategies. In: Proceedings of the 5th International Conference on Music Information Retrieval, Barcelona (2004)

    Google Scholar 

  53. Krumhansl, C.L.: Cognitive Foundations of Musical Pitch. Oxford University Press, New York (1990)

    Google Scholar 

  54. Polzin, T.S., Waibel, A.: Emotion-sensitive human-computer interfaces. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 201–206, Belfast (2000)

    Google Scholar 

  55. Devillers, L., Vasilescu, I., Lamel, L.: Emotion detection in task-oriented dialog corpus. In: Proceedings of the ICME 2003, IEEE, Multimedia Human-Machine Interface and Interaction, pp. 549–552, Baltimore (2003)

    Google Scholar 

  56. Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of the 29th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2004, IEEE, vol. I, pp. 577–580. Montreal, May 2004

    Google Scholar 

  57. Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of Interspeech, Eurospeech, ISCA, pp. 805–809. Lisbo, Sept 2005

    Google Scholar 

  58. Schuller, B., Hage, C., Schuller, D., Rigoll, G.: “mister d.j., cheer me up!”: musical and textual features for automatic mood classification. J. New Music Res. 39(1), 13–34 (2010)

    Google Scholar 

  59. Eyben, F., Wöllmer, M., Valstar, M., Gunes, H., Schuller, B., Pantic, M.: String-based audiovisual fusion of behavioural events for the assessment of dimensional affect. In: Proceedings International Workshop on Emotion Synthesis, Representation, and Analysis in Continuous spacE, EmoSPACE 2011, held in conjunction with the 9th IEEE International Conference on Automatic Face & Gesture Recognition and Workshops, FG 2011, IEEE, IEEE, pp. 322–329. Santa Barbara, CA, March 2011

    Google Scholar 

  60. Porter, M.F.: An algorithm for suffix stripping. Program |textbf3(14), 130–137 (1980)

    Google Scholar 

  61. Batliner, A., Buckow, J., Huber, R., Warnke, V., Nöth, E., Niemann, H.: Prosodic feature evaluation: brute force or well designed? In: Proceedings of the 14th International Congress of Phonetic Sciences, vol. 3, pp. 2315–2318, San Francisco, (1999)

    Google Scholar 

  62. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proceedings 5th Slovenian and 1st International Language Technologies Conference, ISLTC 2006, Slovenian Language Technologies Society, pp. 240–245. Ljubljana, Slovenia, Oct 2006

    Google Scholar 

  63. Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., Cox, C.: ASR for emotional speech: clarifying the issues and enhancing performance. Neural Netw. 18, 437–444 (2005)

    Google Scholar 

  64. Wöllmer, M., Eyben, F., Keshet, J., Graves, A., Schuller, B., Rigoll, G.: Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional lstm networks. In: Proceedings 34th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, IEEE, IEEE, pp. 3949–3952. Taipei, Taiwan, April 2009

    Google Scholar 

  65. Steidl, S., Batliner, A., Seppi, D., Schuller, B.: On the impact of children’s emotional speech on acoustic and language models. EURASIP J. Audio Speech Music Process. (Special Issue on Atyp. Speech 2010) 783954, p. 14 (2010)

    Google Scholar 

  66. Seppi, D., Gerosa, M., Schuller, B., Batliner, A., Steidl, S.: Detecting problems in spoken child-computer interaction. In: Proceedings 1st Workshop on Child, Computer and Interaction, WOCCI 2008, ACM ICMI 2008 post-conference workshop, ISCA, p. 4. Chania, Greece, Oct 2008

    Google Scholar 

  67. Metze, F., Batliner, A., Eyben, F., Polzehl, T., Schuller, B., Steidl, S.: Emotion recognition using imperfect speech recognition. In: Proceedings of INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, ISCA, pp. 478–481. Makuhari, Sept 2010

    Google Scholar 

  68. Schuller, B., Müller, R., Rigoll, G., Lang, M.: Applying bayesian belief networks in approximate string matching for robust keyword-based retrieval. In: Proceedings 5th IEEE International Conference on Multimedia and Expo, ICME 2004, IEEE, vol. 3, pp. 1999–2002. Taipei, Taiwan, June 2004

    Google Scholar 

  69. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Proceedings of 10th European Conference on Machine Learning (ECML), Chemnitz, pp. 137–142. Springer, Heidelberg (1998)

    Google Scholar 

  70. Schuller, B., Köhler, N., Müller, R., Rigoll, G.: Recognition of interest in human conversational speech. In: Proceedings of INTERSPEECH 2006, 9th International Conference on Spoken Language Processing, ICSLP, ISCA, pp. 793–796. Pittsburgh, Sept 2006

    Google Scholar 

  71. Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vis. Compu. (Special Issue Visual Multimodal Anal. Hum. Spontaneous Behav. 27(12), 1760–1774 (2009)

    Google Scholar 

  72. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput. Speech Lang. (Special Issue Affect. Speech Real-Life Interact.) 25(1), 4–28 (2011)

    Google Scholar 

  73. Russell, J., Bachorowski, J., Fernandez-Dols, J.: Facial and vocal expressions of emotion. Annu. Rev. Psychol. 54, pp. 329–349 (2003)

    Google Scholar 

  74. Campbell, N., Kashioka, H., Ohara, R.: No laughing matter. In: Proceedings of Interspeech, pp. 465–468, Lisbon (2005)

    Google Scholar 

  75. Truong, K.P., van Leeuwen, D.A.: Automatic detection of laughter. In: Proceedings of Interspeech, pp. 485–488, Lisbon (2005)

    Google Scholar 

  76. Pal, P., Iyer, A., Yantorno, R.: Emotion detection from infant facial expressions and cries. Proc. ICASSP 2, 809–812 (2006)

    Google Scholar 

  77. Matos, S., Birring, S., Pavord, I., Evans, D.: Detection of cough signals in continuous audio recordings using hmm. IEEE Trans. Biomed. Eng. 53, pp. 1078–1083 (2006)

    Google Scholar 

  78. Schuller, B., Eyben, F., Rigoll, G.: Static and dynamic modelling for the recognition of non-verbal vocalisations in conversational speech. In: André, E., Dybkjaer, L., Neumann, H., Pieraccini, R., Weber, M. (eds.) Perception in Multimodal Dialogue Systems: 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, Kloster Irsee, Germany, 16–18 June 2008. Lecture Notes on Computer Science (LNCS), vol. 5078, pp. 99–110. Springer, Berlin (2008)

    Google Scholar 

  79. Iurgel, U.: Automatic media monitoring using stochastic pattern recognition techniques. Ph.D thesis, Technische Universität München, Germany, (2007)

    Google Scholar 

  80. Schuller, B.: Recognizing affect from linguistic information in 3d continuous space. IEEE Trans. Affect. Comput. 2(4), 192–205 (2012)

    Google Scholar 

  81. Schuller, B., Schenk, J., Rigoll, G., Knaup, T.: “the godfather” vs. “chaos”: Comparing linguistic analysis based on online knowledge sources and bags-of-n-grams for movie review valence estimation. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, ICDAR 2009, IAPR, IEEE, pp. 858–862. Barcelona July 2009

    Google Scholar 

  82. Schuller, B., Knaup, T.: Learning and knowledge-based sentiment analysis in movie review key excerpts. In: Esposito, A., Esposito, A.M., Martone, R., Müller, V., Scarpetta, G. (eds.) Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues: Third COST 2102 International Training School, Caserta, Italy, March 15–19, 2010, Revised Selected Papers. Lecture Notes on Computer Science, 1st edn, vol. 6456, pages 448–472. (LNCS) Springer, Heidelberg, (2011)

    Google Scholar 

  83. Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall, Upper saddle river (2000)

    Google Scholar 

  84. Havasi, C., Speer, R., Alonso, J.: Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In: Recent Advances in Natural Language Processing. Borovets, Sept 2007

    Google Scholar 

  85. Stone, P., Kirsh, J., Associates, C.C.: The General Inquirer: A Computer Approach to Content Analysis. MIT Press, Cambridge (1966)

    Google Scholar 

  86. Fellbaum, C. Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  87. Katz, B.: From sentence processing to information access on the world wide web. In: Proceedings of the AAAI Spring Symposium on Natural Language Processing for the, World Wide Web, pp. 77–86 (1997)

    Google Scholar 

  88. Yi, J., Nasukawa, T., Bunescu, R., Niblack, W.: Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 427–434, Nov 2003

    Google Scholar 

  89. Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: KDD ’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 341–349, New York (2002)

    Google Scholar 

  90. Turney, P.D., Littman, M.L.: Measuring praise and criticism: inference of semantic orientation from association. ACM Trans. Inf. Syst. 21(4), 315–346 (2003)

    Article  Google Scholar 

  91. Zhang, M., Ye, X.: A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval. In: SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, pp. 411–418 (2008)

    Google Scholar 

  92. Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proceedings of the WSDM ’08 International Conference on Web Search and Web Data Mining, ACM, New York, pp. 231–240 (2008)

    Google Scholar 

  93. Pachet, F., Roy, P.: Analytical features: a knowledge-based approach to audio feature generation. EURASIP J. Audio Speech Music Process. 153017, 23 (2009)

    Google Scholar 

  94. Eyben, F., Wöllmer, M., Schuller, B.: Opensmile—the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 9th ACM International Conference on Multimedia, MM 2010, ACM, Florence, pp. 1459–1462, Oct 2010

    Google Scholar 

  95. Boersma, P., Weenink, D.: Praat: doing phonetics by computer (v. 4.3.14). http://www.praat.org/ (2005)

  96. Fernandez, R.: A Computational Model for the Automatic Recognition of Affect in Speech. Ph.D thesis, MIT Media Arts and Science (2004)

    Google Scholar 

  97. Garner, P.N., Dines, J., Hain, T., El Hannani, A., Karafiat, M., Korchagin, D., Lincoln, M., Wan, V., Zhang, L.: Real-time asr from meetings. In Proceedings of INTERSPEECH, ISCA, Brighton 2009

    Google Scholar 

  98. McEnnis, D., McKay, C., Fujinaga, I., Depalle, P.: Jaudio: a feature extraction library. In: Proceedings of ISMIR 2005, pp. 600–603 (2005)

    Google Scholar 

  99. Lerch, A., Eisenberg, G.: FEAPI: a low level feature extraction plug-in api. In: Proceedings of the 8th International Conference on Digital Audio Effects (DAFx), Madrid 2005

    Google Scholar 

  100. Amatriain, X., Arumi, P., Garcia,D.: A framework for efficient and rapid development of cross-platform audio applications. Multimedia Syst. 14(1), 15–32 (2008)

    Google Scholar 

  101. Schuller, B., Eyben, F., Rigoll, G.: Fast and robust meter and tempo recognition for the automatic discrimination of ballroom dance styles. In: Proceedings 32nd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2007, IEEE, vol. I, pp. 217–220. Honolulu, April 2007

    Google Scholar 

  102. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  103. Hoare, C.A.R.: Quicksort. Comput. J. 5(1), 10–16 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  104. Eyben, F., Wöllmer, M., Schuller, B.: Openear - introducing the munich open-source emotion and affect recognition toolkit. In: Proceedings of 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, HUMAINE Association, IEEE, vol. I, pp. 576–581. Amsterdam, Sept 2009

    Google Scholar 

  105. Schuller, B., Arsić, D., Wallhoff, F., Lang, M., Rigoll, G.: Bioanalog acoustic emotion recognition by genetic feature generation based on low-level-descriptors. In: Proceedings International Conference on Computer as a Tool, EUROCON 2005, IEEE, vol. 2, pp. 1292–1295. Belgrade, Serbia and Montenegro, Nov 2005

    Google Scholar 

  106. Schuller, B., Reiter, S., Rigoll, G.: Evolutionary feature generation in speech emotion recognition. In: Proceedings of 7th IEEE International Conference on Multimedia and Expo, ICME 2006, IEEE, pp. 5–8. Toronto, July 2006

    Google Scholar 

  107. Schuller, B., Wallhoff, F., Arsić, D., Rigoll, G.: Musical signal type discrimination based on large open feature sets. In: Proceedings of 7th IEEE International Conference on Multimedia and Expo, ICME 2006, IEEE, pp. 1089–1092. Toronto, July 2006

    Google Scholar 

  108. Kroschel, K., Rigoll, G., Schuller, B.: Statistische Informationstechnik, 5th edn. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Schuller .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schuller, B. (2013). Audio Features. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36806-6_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36805-9

  • Online ISBN: 978-3-642-36806-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics