Abstract
A huge variety of learning algorithms is applied in the field of Intelligent Audio Analysis depending on the nature of the target task of interest such as being static or dynamic. In addition, manifold other factors are decisive when selecting the optimal algorithm incorporating aspects such as efficiency or reliability. The most frequently found representatives are explained in detail: Decision Trees, Support Vector-based approaches, Artificial Neural Networks including the Long Short-Term Memory paradigm, as well as dynamic modelling by hidden Markov models. In addition, bootstrapping, meta-learning, and tandem learning are described. This is followed by the optimal evaluation of such algorithms touching partitioning and balancing, and evaluation measures as are frequently used in the field.
Learning without thought is labor lost; thought without learning is perilous.
—Confucius.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Note that for better readability, the time \(t\) is in this section used in the subscript or argument following [32].
References
Kroschel, K., Rigoll, G., Schuller, B.: Statistische informationstechnik, 5th edn. Springer, Berlin (2011)
Quinlan, JR., C4.5: Programs for machine learning. Morgan Kaufmann, Burlington (1993)
Quinlan, J.: Learning efficient classification procedures and their application to chess end games. In machine learning: an artificial intelligence approach, pp .106–121. Tioga Publishing, Palo Alto (1983)
Quinlan, J.: Simplifying decision trees. Int. J. Man Mach. Stud. 27, 221–234 (1987)
Quinlan, J., Bagging, Boosting and C4.5. In Proceedings 14th National Conference on AI, vol. 5, pp. 725–730, AAAI Press, Menlo Park (1996)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Hochreiter, S., Mozer, M., Obermayer, K.: Coulomb classifiers: generalizing support vector machines via an analogy to electrostatic systems. Adv. Neural Inf. Process. Sys. 15, (2002)
Cristianini, N., Shawe-Taylor, J .: An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge (2000)
Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report MSR-98–14, Microsoft Research, New York (1998)
Schölkopf, B., Smola, A.: Learning with kernels: support vector machines, regularization, optimization, and beyond (Adaptive computation and machine learning). MIT Press, Cambridge (2002)
Yang, H., Xu, Z., Ye, J., King, I., Lyu, M.: Efficient sparse generalized multiple kernel learning. IEEE Trans. Neural Netw. 22(3), 433–446 (2011)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
Smola, A., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Niemann, H.: Klassifikation von mustern. published online, 2nd, revised and extended edition (2003)
McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bulletin Math. Biophy. 5, 115–133 (1943)
Schuller, B.: Automatische emotionserkennung aus sprachlicher und manueller interaktion. Doctoral thesis, Technische Universität München, Munich (2006)
Rigoll, G.: Neuronale netze. Expert-Verlag (1994)
Deller, J., Proakis, J., J. Hansen.: Discrete-time processing of speech signals. Macmillan Publishing Company, New York (1993)
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. In parallel distributed processing: explorations in the microstructure of cognition, vol. 1, pp. 318–362. MIT Press, Boston (1987)
Schalkoff, R.:Artificial neural networks. McGraw-Hill, New York (1994)
Riedmiller, M., Braun, H.: Rprop—A fast adaptive learning algorithm. In Proceedings of the International Symposium on Computer and Information Science, vol. 7, (1992)
Lacoste, A., Eck, D.: Onset detection with artificial neural networks. In MIREX (2005)
Werbos, P.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A field guide to dynamical recurrent neural networks, pp. 1–15. IEEE Press, New York (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Gers, F., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Graves, A.: Supervised sequence labelling with recurrent neural networks. Ph.D thesis, Technische Universität München, Munich (2008)
Wöllmer, M., Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Tandem decoding of children’s speech for keyword detection in a child-robot interaction scenario. ACM Transactions on Speech and Language Processing, (Special Issue Speech Lang. Process. Children’s Speech Child-mach. Interact. Appl.). vol. 7(4), August 2011, 22 pages
Wöllmer, M., Schuller, B., Eyben, F., Rigoll, G.: Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening. IEEE J. Sel. Top. Sig. Process. (Special Issue Speech Process. Nat. Interact. Intell. Environ). 4(5), 867–881 (October 2010)
Wöllmer, M., Blaschke, C., Schindl, T., Schuller, B., Färber, B., Mayer, S., Trefflich, B.: On-line driver distraction detection using long short-term memory. IEEE Trans. Intell. Transp. Syst. 12(2), 574–582 (2011)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)
O’Shaughnessy, D.: Speech communication. Adison-Wesley, 2nd edn, Boston (1990)
Jelinek, F.: Statistical methods for speech recognition. MIT Press, Cambridge (1997)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)
Ruske, G.: Automatische spracherkennung, 2nd edn. Methoden der Klassifikation und Merkmalsextraktion, Oldenbourg (1993)
White, C.M., Rastrow, A., Khudanpur, S., Jelinek, F.: Unsupervised Estimation of the Language Model Scaling Factor. In Proceedings of the Interspeech, pp. 1195–1198, Brighton (2009)
Furui, S.: Digital speech processing: synthesis, and recognition, 2nd edn. Signal Processing and Communications. Marcel Denker Inc., New York (1996)
Lowerre, B.: The harpy speech recognition system. Ph.D thesis, Carnegie Mellon University, Pittsburgh (1976)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Webb, G.: Multiboosting: A technique for combining boosting and wagging. Mach. Learn. 40, 159–198 (2000)
Valinat, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Freund, Y., Schapire, R.: Experiments with a New Boosting Algorithm, pp. 148–156. In Proceedings of the International Conference on, Machine Learning (1996)
Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Ting, K., Witten, I.: Issues in Stacked Generalization. J. Artif. Intell. Res. 10(1), 271–289 (Jan. 1999)
Seewald, A.: Towards understanding stacking—Studies of a general ensemble learning scheme. Ph.D thesis, Technische Universität Wien, Vienna (2003)
Schuller, B.: Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In Proceedings INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, pp. 312–315, Brighton, September ISCA, ISCA (2009)
Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vision Comput. (Special Issue Visual Multimodal Anal. Human Spontaneous Behav). 27(12), 1760–1774 (2009)
Schuller, B., Schenk, J., Rigoll, G., Knaup, T.: “the godfather” versuss. “chaos”: Comparing linguistic analysis based on online knowledge sources and bags-of-n-grams for movie review valence estimation. In Proceedings 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp.858–862, IAPR, IEEE, Barcelona (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Mower, E., Mataric, M., Narayanan, S.: A framework for automatic human emotion classification using emotional profiles. IEEE Trans. Audio Speech Lang. Process. 19(5), 1057–1070 (2011)
Rozeboom, W.: The fallacy of the null-aypothesis significance test. Psychol. Bull. 57, 416–428 (1960)
Nickerson, R.S.: Null hypothesis significance testing: a review of an old and continuing controversy. Psychol. Bull. 5, 241–301 (2000)
Eysenck, H.: The concept of statistcal significance and the controversy about one-tailed tests. Psychol. Bull. 67, 269–271 (1960)
Gillick, L., Cox, S.J.: Some statistical issues in the comparison of speech recognition algorithms. In Proceedings International Conference on Audio Speech and Signal Processing (ICASSP), vol. 1, pp. 23–26, IEEE, Glasgow (1989)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schuller, B. (2013). Audio Recognition. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-36806-6_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36805-9
Online ISBN: 978-3-642-36806-6
eBook Packages: EngineeringEngineering (R0)