Audio Recognition

Schuller, Björn

doi:10.1007/978-3-642-36806-6_7

Audio Recognition

Björn Schuller²

Chapter
First Online: 01 January 2013

2186 Accesses

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

A huge variety of learning algorithms is applied in the field of Intelligent Audio Analysis depending on the nature of the target task of interest such as being static or dynamic. In addition, manifold other factors are decisive when selecting the optimal algorithm incorporating aspects such as efficiency or reliability. The most frequently found representatives are explained in detail: Decision Trees, Support Vector-based approaches, Artificial Neural Networks including the Long Short-Term Memory paradigm, as well as dynamic modelling by hidden Markov models. In addition, bootstrapping, meta-learning, and tandem learning are described. This is followed by the optimal evaluation of such algorithms touching partitioning and balancing, and evaluation measures as are frequently used in the field.

Learning without thought is labor lost; thought without learning is perilous.

—Confucius.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Note that for better readability, the time \(t\) is in this section used in the subscript or argument following [32].

References

Kroschel, K., Rigoll, G., Schuller, B.: Statistische informationstechnik, 5th edn. Springer, Berlin (2011)
Book MATH Google Scholar
Quinlan, JR., C4.5: Programs for machine learning. Morgan Kaufmann, Burlington (1993)
Google Scholar
Quinlan, J.: Learning efficient classification procedures and their application to chess end games. In machine learning: an artificial intelligence approach, pp .106–121. Tioga Publishing, Palo Alto (1983)
Google Scholar
Quinlan, J.: Simplifying decision trees. Int. J. Man Mach. Stud. 27, 221–234 (1987)
Article Google Scholar
Quinlan, J., Bagging, Boosting and C4.5. In Proceedings 14th National Conference on AI, vol. 5, pp. 725–730, AAAI Press, Menlo Park (1996)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Hochreiter, S., Mozer, M., Obermayer, K.: Coulomb classifiers: generalizing support vector machines via an analogy to electrostatic systems. Adv. Neural Inf. Process. Sys. 15, (2002)
Google Scholar
Cristianini, N., Shawe-Taylor, J .: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report MSR-98–14, Microsoft Research, New York (1998)
Google Scholar
Schölkopf, B., Smola, A.: Learning with kernels: support vector machines, regularization, optimization, and beyond (Adaptive computation and machine learning). MIT Press, Cambridge (2002)
Google Scholar
Yang, H., Xu, Z., Ye, J., King, I., Lyu, M.: Efficient sparse generalized multiple kernel learning. IEEE Trans. Neural Netw. 22(3), 433–446 (2011)
Article Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
MATH Google Scholar
Smola, A., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
Niemann, H.: Klassifikation von mustern. published online, 2nd, revised and extended edition (2003)
Google Scholar
McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bulletin Math. Biophy. 5, 115–133 (1943)
Article MathSciNet MATH Google Scholar
Schuller, B.: Automatische emotionserkennung aus sprachlicher und manueller interaktion. Doctoral thesis, Technische Universität München, Munich (2006)
Google Scholar
Rigoll, G.: Neuronale netze. Expert-Verlag (1994)
Google Scholar
Deller, J., Proakis, J., J. Hansen.: Discrete-time processing of speech signals. Macmillan Publishing Company, New York (1993)
Google Scholar
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. In parallel distributed processing: explorations in the microstructure of cognition, vol. 1, pp. 318–362. MIT Press, Boston (1987)
Google Scholar
Schalkoff, R.:Artificial neural networks. McGraw-Hill, New York (1994)
Google Scholar
Riedmiller, M., Braun, H.: Rprop—A fast adaptive learning algorithm. In Proceedings of the International Symposium on Computer and Information Science, vol. 7, (1992)
Google Scholar
Lacoste, A., Eck, D.: Onset detection with artificial neural networks. In MIREX (2005)
Google Scholar
Werbos, P.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Article Google Scholar
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A field guide to dynamical recurrent neural networks, pp. 1–15. IEEE Press, New York (2001)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Gers, F., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Article Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Article Google Scholar
Graves, A.: Supervised sequence labelling with recurrent neural networks. Ph.D thesis, Technische Universität München, Munich (2008)
Google Scholar
Wöllmer, M., Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Tandem decoding of children’s speech for keyword detection in a child-robot interaction scenario. ACM Transactions on Speech and Language Processing, (Special Issue Speech Lang. Process. Children’s Speech Child-mach. Interact. Appl.). vol. 7(4), August 2011, 22 pages
Google Scholar
Wöllmer, M., Schuller, B., Eyben, F., Rigoll, G.: Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening. IEEE J. Sel. Top. Sig. Process. (Special Issue Speech Process. Nat. Interact. Intell. Environ). 4(5), 867–881 (October 2010)
Article Google Scholar
Wöllmer, M., Blaschke, C., Schindl, T., Schuller, B., Färber, B., Mayer, S., Trefflich, B.: On-line driver distraction detection using long short-term memory. IEEE Trans. Intell. Transp. Syst. 12(2), 574–582 (2011)
Article Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)
Google Scholar
O’Shaughnessy, D.: Speech communication. Adison-Wesley, 2nd edn, Boston (1990)
Google Scholar
Jelinek, F.: Statistical methods for speech recognition. MIT Press, Cambridge (1997)
Google Scholar
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)
Article MathSciNet MATH Google Scholar
Ruske, G.: Automatische spracherkennung, 2nd edn. Methoden der Klassifikation und Merkmalsextraktion, Oldenbourg (1993)
Google Scholar
White, C.M., Rastrow, A., Khudanpur, S., Jelinek, F.: Unsupervised Estimation of the Language Model Scaling Factor. In Proceedings of the Interspeech, pp. 1195–1198, Brighton (2009)
Google Scholar
Furui, S.: Digital speech processing: synthesis, and recognition, 2nd edn. Signal Processing and Communications. Marcel Denker Inc., New York (1996)
Google Scholar
Lowerre, B.: The harpy speech recognition system. Ph.D thesis, Carnegie Mellon University, Pittsburgh (1976)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Webb, G.: Multiboosting: A technique for combining boosting and wagging. Mach. Learn. 40, 159–198 (2000)
Article Google Scholar
Valinat, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Article Google Scholar
Freund, Y., Schapire, R.: Experiments with a New Boosting Algorithm, pp. 148–156. In Proceedings of the International Conference on, Machine Learning (1996)
Google Scholar
Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Article Google Scholar
Ting, K., Witten, I.: Issues in Stacked Generalization. J. Artif. Intell. Res. 10(1), 271–289 (Jan. 1999)
MATH Google Scholar
Seewald, A.: Towards understanding stacking—Studies of a general ensemble learning scheme. Ph.D thesis, Technische Universität Wien, Vienna (2003)
Google Scholar
Schuller, B.: Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In Proceedings INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, pp. 312–315, Brighton, September ISCA, ISCA (2009)
Google Scholar
Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vision Comput. (Special Issue Visual Multimodal Anal. Human Spontaneous Behav). 27(12), 1760–1774 (2009)
Google Scholar
Schuller, B., Schenk, J., Rigoll, G., Knaup, T.: “the godfather” versuss. “chaos”: Comparing linguistic analysis based on online knowledge sources and bags-of-n-grams for movie review valence estimation. In Proceedings 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp.858–862, IAPR, IEEE, Barcelona (2009)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Mower, E., Mataric, M., Narayanan, S.: A framework for automatic human emotion classification using emotional profiles. IEEE Trans. Audio Speech Lang. Process. 19(5), 1057–1070 (2011)
Article Google Scholar
Rozeboom, W.: The fallacy of the null-aypothesis significance test. Psychol. Bull. 57, 416–428 (1960)
Article Google Scholar
Nickerson, R.S.: Null hypothesis significance testing: a review of an old and continuing controversy. Psychol. Bull. 5, 241–301 (2000)
Google Scholar
Eysenck, H.: The concept of statistcal significance and the controversy about one-tailed tests. Psychol. Bull. 67, 269–271 (1960)
Google Scholar
Gillick, L., Cox, S.J.: Some statistical issues in the comparison of speech recognition algorithms. In Proceedings International Conference on Audio Speech and Signal Processing (ICASSP), vol. 1, pp. 23–26, IEEE, Glasgow (1989)
Google Scholar

Download references

Author information

Authors and Affiliations

LS für Mensch-Maschine-Kommunikation, TU München, Arcisstr. 21, München, 80290, Germany
Björn Schuller

Authors

Björn Schuller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Björn Schuller .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schuller, B. (2013). Audio Recognition. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-36806-6_7
Published: 25 April 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36805-9
Online ISBN: 978-3-642-36806-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics