Summary
This chapter surveys the contemporary approaches of automatic sound recognition and discusses the benefits stemming from real-world applications of this technology. We identify the common aspects and subtle differences among these diverse application areas and review state-of-the-art systems. In this context we project that there is much space for knowledge transfer between the different subfields of sound classification, which seem to evolve independently while achieving different states of maturity. Particular emphasis is given to lessons learned from the speech recognition paradigm, which together with speaker recognition were among the first applications of sound classification that reached the status of launching commercial products at a large climax. Special attention is paid to new emerging applications such as environmental monitoring and bioacoustic identification and applications to music which have already started altering our everyday life as we once knew it.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Deng, L., O’Shaughnessy, D., Speech Processing: A Dynamic and Optimization-Oriented Approach, Marcel Dekker, New York, 2003.
Garces, M., Hetzer, C., Merrifield, M., Willis, M., Aucan, J., Observations of surf infrasound in Hawai’I, In Geophysical Research Letters, pp. 2264-2267,2003.
Auckland, D.W., McGrail, A.J., Smith, C.D., Varlow, B.R., Zhao, J., Zhu, D., The application of ultrasound to the inspection of insulation, In Proceedings of the IEEE 5th International Conference on Conduction and Breakdown in Solid Dielectrics, pp. 590-594, 1995.
Höge, H., Draxler, C., Van den Heuvel, H., Johansen, F.T., Sanders, E., Tropf, H.S., SpeechDat multilingual speech databases for teleservices: across the finish line, In Proceedings of the Eurospeech’99, Budapest, vol. 6, pp. 2699-2702, 1999.
Benyassine, A., Shlomot, E., Su, H.-Y., ITU recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications, In IEEE Communications Magazine, pp. 64-73, 1997.
Sohn, J., Kim, N.S., Sung, W., A statistical model-based voice activity detec-tion, In IEEE Signal Processing Letters, vol. 6, pp. 1-3, 1999.
Cho, Y.D., Kondoz, A., Analysis and improvement of a statistical model-based voice activity detector, In IEEE Signal Processing Letters, vol. 8, pp. 276-278,2001.
Chollet, G., Automatic Speech and Speaker Recognition: Overview, Current Issues and Perspectives, In Keller, E. (Ed.), Fundamentals of Speech Synthesis and Speech Recognition. Basic Concepts, State of the Art and Future Chal-lenges. Chichester, Wiley, pp. 129-148, 1994.
Zue, V., Cole, R., Ward, W., Speech Recognition, In Cole, R.A., Mariani, J., Uszkoreit, H., Zaenen, A., Zue, V. (Eds.), Survey of the State of the Art in Hu-man Language Technology, Cambridge, Cambridge University Press, pp. 4-10, 1997.
Reynolds, D.A., Rose, R.C., Robust text-independent speaker identification using Gaussian mixture speaker models, In IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, January 1995.
Furui, S., Speaker Recognition, In Cole, R. (Ed.), Survey of the State of the Art in Human Language Technology, Chapter 1.7, Oregon Health & Science U., 1996.
Gish, H., Schmidt, M., Text-idependent speaker identification, In IEEE Signal Processing Magazine, vol. 11, no. 4, pp.18-32, October 1994.
Zervas, P., Mporas, I., Fakotakis, N., Kokkinakis, G., Evaluating intonational features for emotion recognition from speech, In International Journal of Ar-tificial Intelligence Tools, 2007.
Kwon, O., Chan, K., Hao, J., Lee, T., Emotion recognition by speech signals, In Proceedings of the Eurospeech’03, Geneva, pp. 125-128, 2003.
Muthusamy, Y., Barnard, E., Cole, R., Reviewing automatic language recog-nition, In IEEE Signal Processing Magazine, pp. 33-41, October 1994.
Hansen, J., Arslan, L., Foreign accent classification using source genera-tor based prosodic features, In Proceedings of the ICASSP’95, Detroit, MI, pp. 836-839, 1995.
Hansen, J.H.L., Gavidia-Ceballos, L., Kaiser, J.F., A nonlinear based speech feature analysis method with application to vocal fold pathology assessment, In IEEE Transactions on Biomedical Engineering, vol. 45, no. 3, pp. 300-313, March 1998.
Gavidia-Ceballos, L., Hansen, J.H.L., direct speech feature estimation using an iterative EM algorithm for vocal cancer detection, In IEEE Transactions on Biomedical Engineering, vol. 43, no. 4, pp. 373-383, April 1996.
Tzanetakis, G., Cook, P., Musical Genre classification of audio signals, In IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, July 2002.
Gouyon, F., Dixon, S., Pampalk, E., Widmer, G., Evaluating rhythmic descrip-tors for musical genre classification, In Proceedings of the AES 25th Interna-tional Conference, London, United Kingdom, June 17-19, 2004.
FitzGerald, D., Coyle, E., Lawlor, B., Sub-band independent subspace analysis for drum transcription, In Proceedings of the DAFX’02, pp. 65-69, 2002.
Klapuri, A., Davy, M., (Eds.), Signal Processing Methods for Music Transcrip-tion, Springer, Berlin Heidelberg New York, 2006.
Widmer, G. (Ed.), Special Issue on Machine Learning in Music, In Machine Learning, vol. 65, no. 2-3, December 2006.
Eggink, J., Brown, G.J., Instrument recognition in accompanied sonatas and concertos, In Proceedings of the ICASSP’04, Montreal, Canada, pp. 217-220, May 2004.
Livshin, A.A., Rodet, X., Musical instrument identification in continuous recordings, In Proceedings of the DAFX’04, Naples, Italy, October 5-8, 2004.
Peeters, G., Automatic classification of large musical instrument databases using hierarchical classifiers with inertia ratio maximization, In Proceedings of the AES 115th convention, New York, USA, October 10-13, 2003.
Eggink, J., Brown, G.J., A missing feature approach to instrument identifi-cation in polyphonic music, In Proceedings of the ICASSP’03, Hong Kong, pp. 553-556, April 2003.
Liu, M., Wan, C., Feature selection for automatic classification of musical in-strument sounds, In Proceedings of the 1st ACM/IEEE-CS Joint conference on Digital libraries, pp. 247-248, 2001.
Essid, S., Richard, G., David, B., Efficient musical instrument recognition on solo performance music using basic features, In Proceedings of the AES 25th International Conference, London, UK, June 2004.
Herrera, P., Yeterian, A., Gouyon, F., Automatic classification of drum sounds: a comparison of feature selection methods and classification techniques, In Proceedings of Second International Conference on Music and Artificial Intelligence, Edinburgh, Scotland, 2002.
Eronen, A., Musical instrument recognition using ICA-based transform of fea-tures and discriminatively trained HMMs, In Proceedings of the Seventh Inter-national Symposium on Signal Processing and it’s Applications, pp. 133-136, July 2003.
Eronen, A., Klapuri, A., Musical instrument recognition using cepstral coeffi- cients and temporal features, In Proceedings of the ICASSP’00, pp. 753-756, 2000.
Brown, J.C., Houix, O., McAdams, S., Feature dependence in the automatic identification of musical woodwind instruments, In Journal of the Acoustical Society of America, vol. 109, no. 3, pp. 1064-1072, March 2000.
Herrera, P., Peeters, G., Dubnov, S., Automatic classification of musical in-strument sounds, New Music Research, vol. 32, no. 1, 2003.
Peeters, G., Rodet, X., Automatically selecting signal descriptors for sound classification. In Proceedings of the ICMC’02, Goteborg, Sweden, September 2002.
Wold, T., Blum, D., Wheaton, J., Content-based classification, search, and retrieval of audio, In Proceedings of the IEEE Multimedia, vol.3, no.3, pp. 2736, 1996.
Slaney, M., Mixtures of probability experts for audio retrieval and indexing, In Proceedings of the IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, vol. 1, pp. 345-348, August 2002.
Berenzweig, A., Ellis, D.P.W., Lawrence, S., Anchor space for classification and similarity measurement of music, In Proceedings of the IEEE International Conference on Multimedia and Expo, vol. 1, pp. 29-32, 2003.
Drosopoulos, S., Claridge M., Insect sounds and communication: physiology, behaviour, ecology, and evolution, Contemporary Topics in Entomology, CRC Press, 2005.
Helweg, D.A., Automatic detection and species identification of blue and fin whale calls, In Bioacoustics, vol. 13, p. 96, 2002.
Hennig, R.M., Acoustic feature extraction by cross-correlation in crickets, In Journal of Comparative Physiology. A, Neuroethology, Sensory, Neural, and Behavioral Physiology, vol. 189, pp. 589-598, 2003.
Oba, T., Application of automated bioacoustic identification in environmental education and assessment, In Anais da Academia Brasileira de Cincias, vol. 76, pp. 446-451, 2004.
Potamitis, I., Ganchev, T., Fakotakis, N., Automatic acoustic identifica- tion of insects inspired by the speaker recognition paradigm, In Proceedings of the Interspeech-ICSLP’06, Pittsburg PA, USA, paper 1505-Wed3CaP.13, September 17-21, 2006.
Skowronski, M., Harris, J., Acoustic detection and classification of microchi-roptera using machine learning: Lessons learned from automatic speech recog-nition, In Journal of the Acoustical Society of America, vol. 119, pp. 1817-1833, 2006.
Alexander, R., Sound production and associated behavior in insects, In The Ohio Journal of Science, vol. 57, no. 2, pp. 101-113, 1957.
Bennett-Clark, H., Resonators in insect sound production: how insects produce loud pure-tone songs, In Journal of Experimental Biology, vol. 202, pp. 3347-3357,1999. 3 Generalized Recognition of Sound Events: Approaches and Applications73
Martin, K., Sound-source recognition: a theory and computational model, Ph.D. Thesis, MIT, Media Lab, 1999.
Ashiya, T., Hagiwara, M., Nakagawa, M., IOSES: An indoor observation sys-tem based on environmental sounds recognition using a neural network, In Transactions of the Institute of Electrical Engineers of Japan, vol. 116-C, no. 3, pp. 341-349, 1996.
Cowling, M., Sitte, R., Comparison of techniques for environmental sound recognition, In Pattern Recognition Letters, vo1. 24, no. 15, pp. 2895-2907, 2003.
Goldhor, R.S., Recognition of environmental sounds, In Proceedings of the ICASSP93, vol. 1, pp. 149-152, 1993.
Arrigoni, J.E., An evaluation of amphibian monitoring approaches in the maya forest, Chapter 3: An assessment of the vocalization survey method for mon-itoring anuran populations in the Maya Forest, Master thesis, pp. 21-42, February, 2003.
Lee, C.-H., Chou, C.-H., Han, C.-C., Huang, R.-Z., Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis, In Pattern Recognition Letters, vol. 27, pp. 93-101, 2006.
Mitrovic, D., Zeppelzauer, M., Discrimination and Retrieval of Animal Sounds, In Proceedings of the IEEE Multimedia Modelling Conference, Beijing, China, pp. 339-343, 2006.
Gaston, K., O’Neill, M.A., Automated species identification - why not? In Philosophical Transactions-Royal Society of London. Biological Sciences, vol. 359, no. 1444, pp. 655-667, 2004.
Watson, A.T., O’Neill, M.A., Kitching, I.J., A qualitative study investigating automated identification of living macrolepidoptera using the Digital Auto-mated Identification SYstem (DAISY), In Systematics and Biodiversity, vol. 1, no. 1, 2003.
Chesmore, E., Application of time domain signal coding and artificial neural networks to passive acoustical identification of animals, In Applied Acoustics, vol. 62, pp. 1359-1374, 2001.
Dietrich, C., Temporal sensor fusion for the classification of bioacoustic time se-ries, PhD thesis, University of Ulm, Department of Neural Information Process-ing, 2004.
Guo, Y.B., Ammula, S.C., Real-time acoustic emission monitoring for surface damage in hard machining, In International Journal of Machine Tools and Manufacture, vol. 45, pp. 1622-1627, 2005.
SrinivasaPai, P., Ramakrishna Rao, P.K., Acoustic emission analysis for tool wear monitoring in face milling, In International Journal Production Research, vol. 40, no. 5, pp. 1081-1093, 2002.
Dornfeld, D.A., Manufacturing process monitoring and analysis using acoustic emission, In Journal Acoustic Emission, vol. 4, no. 2-3, pp. 123-126, 1985.
Dimla, D.E., Jr., Lister, P.M., Leighton, N.J., Neural network solutions to the tool condition monitoring problem in metal cutting. A critical review of methods, In International Journal of Machine Tools Manufacturing, vol. 37, no. 9, pp. 1219-1240, 1997.
Diniz, A.E., Liu, J.J., Dornfeld, D.A., Correlating tool life, tool wear and sur-face roughness by monitoring acoustic emission in turning, In Wear, vol. 152, pp. 395-407, 1992.
Diei, E.N., Dornfeld, D.A., Acoustic emission sensing of tool wear in face milling, In Transactions of ASME, Journal of Engineering for Industry, vol. 109, pp. 234-240, 1987.
Kannatey-Asibu, E., Jr., Dornfeld, D.A., Quantitative relationships for acoustic emission from orthogonal metal cutting, In Transactions of ASME, Journal of Engineering for Industry, vol. 103, pp. 330-339, 1981.
Carolan, T.A., Kidd, S.R., Hand, D.P., Wilcox, S.J., Wilkinson, P., Barton, J.S., Jones, J.D.C., Reuben, R.L., Acoustic emission monitoring of tool wear during the face milling of steels and aluminium alloys using a fiber optic sensor energy analysis, In Proceedings of the Institution of Mechanical Engineers, 211(B), pp. 299-309, 1997.
Iwata, K., Moriwaki, T., An application of acoustic emission measurements to in process sensing of tool wear, In Annals of the CIRP, vol. 25, no. 1, pp. 21-26, 1977.
Sampath, A., Vajpayee, S., Tool health monitoring using acoustic emission, In International Journal of Production Research, vol. 25, no. 5, pp. 703-719, 1987.
Lister, P.M., Barrow, G., Tool condition monitoring systems, In Proceedings of the 26th International Machine Tool Design and Research Conference, pp. 271-288,1986.
Inasaki, I., Application of acoustic emission sensor for monitoring machining processes, In Ultrasonics, vol. 36, pp. 273-281, 1998.
Eronen, A.J., Peltonen, V.T., Tuomi, J.T., Klapuri, A.P., Fagerlund, S., Sorsa, T., Lorho, G., Huopaniemi, J., Audio-Based Context Recognition, In IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 1, January 2006.
Gellersen, H.-W., Schmidt, A., Beigl, M., Adding some smartness to devices and everyday things, In Proceedings of the Third IEEE Workshop on Mobile Computing Systems and Applications, pp. 3-10, 2005.
Vemuri, S., Schmandt, C., Bender, W., Tellex, S., Lassey, B., An audio-based personal memory aid, In Proceedings of the 6th International Conference Ubiq-uitous Computing, Ubicomp’04, pp. 400-417, 2004.
Chu, S., Narayanan, S., Jay Kuo, C.-C., Content analysis for acoustic en-vironment classification in mobile robots, In Proceedings of AAAI 2006 Fall Symposium, Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic Systems, Arlington, VA, October 2006.
Clarkson, B., Sawhney, N., Pentland, A., Auditory context awareness via wear-able computing, In Proceedings of the Workshop on Perceptual User Interfaces, November 1998.
Képesi, M., Weruaga, L., Adaptive chirp-based time-frequency analysis of speech signals, In Speech Communication, vol. 48, no. 5, pp. 474-492, 2006.
Gopalan, K., Speech modification by selective fourier-bessel series expansion of speech signals, In IEEE Pacific Rim Conference on Communications, Com-puters and Signal Processing, pp. 388-392, 1999.
Irino, T., Patterson, R.D., Stabilised wavelet Mellin transform: An auditory strategy for normalising sound-source size, In Proceedings of the Eurospeech ’99, Budapest, pp. 1899-1902, Hungary, 1999.
Wolfe, P.J., Godsill, S.J., Ng, W.-J., Bayesian variable selection and regularisa-tion for time-frequency surface estimation, In Journal of The Royal Statistical Society Series B, Royal Statistical Society, vol. 66, no. 3, pp. 575-589, 2004.
Hong, L., Rosca, J., Balan, R., Bayesian single channel speech enhancement exploiting sparseness in the ICA domain, In Proceedings of the EUSIPCO 2004, Vienna, Austria, September 2004.
Mossing, J.C., Tuthill, T.A., Reduced interference distributions for the detec-tion andclassification of outside sound source acoustic emissions, In Proceedings of the ICASSP’96, vol. 5, pp. 2758-2761, 1996.
Tzanetakis, G., Essl, G., Cook, P.R., Audio analysis using the discrete wavelet transform, In Proceedings of WSES International Conference, Acoustics and Music: Theory and Applications (AMTA), Skiathos, Greece, 2001.
Purat, M., Noll, P., Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms, In Proceedings of the ICASSP’96, vol. 2, pp. 1021-1024, 1996.
Davis, S.B., Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, In IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.28, no.4, pp. 357-366, 1980.
Kim, H., Moreau, N., Sikora, T., Audio classification based on MPEG-7 spec-tral basis representations, In IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, pp. 716-725, 2004.
Allamanche, E., Herre, J., Hellmuth, O., Fröba, B., Kastner, T., Cremer, M., Content-based identification of audio material using MPEG-7 low-level de-scription. In Proceedings of the International Conference on Music Information Retrieval, 2001.
Quackenbush, S., Lindsay, A., Overview of MPEG-7 audio, In IEEE Transac-tions on Circuits Systems for Video Technology, vol. 11, pp. 725-729, 2001.
Peeters, G., McAdams, S., Herrera, P. Instrument sound description in the context of MPEG-7, In Proceedings of the International Conference on Music and Computers (ICMC), Berlin, Germany, 2000.
Kim, H., Sikora, T., Comparison of MPEG-7 audio spectrum projection fea-tures and MFCC applied to speaker recognition, sound classification and audio segmentation, In Proceedings of the ICASSP’04, vol. 5, pp. 925-928, 2004.
Casey, M., MPEG-7 sound recognition tools, In IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, pp. 737-747, 2001.
Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, S.T., Comparing MFCC and MPEG-7 audio features for feature extraction, Maximum Likelihood HMM and Entropic Prior HMM for sports audio classification, In Proceedings of the International Conference on Multimedia and Expo, vol. 3, pp. 397-400, 2003.
Haeb-Umbach, R., Ney, H., Linear discrimination analysis for improved large vocabulary continuous speech recognition, In Proceedings of the ICASSP’92, pp. 113-116, 1992.
Tokuhira, M., Ariki, Y., Effectiveness of KL-transformation in spectral delta expansion, In Proceedings of the Eurospeech’99, vol. 1, pp. 359-362, 1999.
Saul, L.K., Rahim, M.G., Maximum likelihood and minimum classification error factor analysis for automatic speech recognition, In IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 115-125, March 2000.
Casey, M.A., Reduced-rank spectra and minimum-entropy priors as consis-tent and reliable cues for generalized sound recognition, In Proceedings of the Workshop on Consistent and Reliable Acoustic Cues for Sound Analysis, Eurospeech’01, Aalborg, Denmark, 2001.
Lee, T.-W., Jang, G.-J., The statistical structure of male and female speech signals, In Proceedings of the ICASSP’01, vol. 1, pp. 105-108, May 2001.
Eisele, T., Haeb-Umbach, R., Langmann, D., A comparative study of linear feature transformation techniques for automatic speech recognition, In Pro-ceedings of the ICSLP’96, pp. 252-255, 1996.
Battle, E., Nadeu, C., Fonollosa, J., Feature decorrelation methods in speech recognition. A comparative study, In Proceedings of the ICSLP’98, pp. 951-954,1998.
Bayes, T., An essay towards solving a problem in the doctrine of chances, In Philosophical Transactions of the Royal Society of London, vol. 53, pp. 370-418,1763.
Fisher, R.A., The use of multiple measurements in taxonomic problems, In Annals of Eugenics, vol. 7, pp. 179-188, 1936.
Specht, D.F., Generation of polynomial discriminant functions for pattern recognition, In IEEE Transactions on Electronic Computers, vol. 16, pp. 308-319,1967.
Lang, K.J., Hinton, G.E., A time delay neural network architecture for speech recognition, Technical Report CMU-cs-88-152, Carnegie Mellon University, Pittsburgh PA, 1988.
Jordan, M.I., Serial order: A parallel distributed processing approach, Institute for Cognitive Science, Report 8604, University of California, San Diego, 1986.
Elman, J.L., Finding structure in time, In Cognitive Science, vol. 14, pp. 179-211,1990.
Rosenblatt, F., The perceptron: a probabilistic model for information stor- age and organization in the brain, In Psychological Review, vol. 65, pp. 386-408,1958.
Vapnik, V.N., The Nature of Statistical Learning Theory, Springer, 1995.
Specht, D.F., Probabilistic neural networks for classification, mapping, or as- sociative memory, In Proceedings of the IEEE Conference on Neural Networks, San Diego, vol. 1, pp. 525-532, July 1988.
Hansen, L.P., Large sample properties of generalized method of moments esti- mation, In Econometrica, vol. 50, pp. 1029-1054, 1982.
Baum, L.E., Petrie, T., Statistical inference for probabilistic functions of finite state markov chains, In Annals of Mathematical Statistics, vol. 37, pp. 1554-1563,1966.
Cover, T., Hart, P., Nearest neighbour pattern classification, In IEEE Trans- actions on Information Theory, vol. 13, pp. 21-27, 1967.
Kohonen, T., Learning vector quantization for pattern recognition, Technical Report TKK-F-A601, Helsinki University of Technology, 1986.
Powell, M.J.D., Radial basis Functions for Multivariable Interpolation: A Re- view, In Mason, J., Cox, M. (Eds.), Algorithms for Approximation, Oxford, Clarendon Press, pp. 143-167, 1987.
Bengio, S., Mariethoz. J., Learning the decision function for speaker verifica- tion, Technical Report, IDIAP Research Report 00-40, IDIAP, January 2001.
Bourlard, H.A., Morgan, N., Connectionist speech recognition: A hybrid ap-proach, Kluwer, 1994.
Neto, J., Almeida, L., Hochberg, M., Martins, C., Nunes, L., Renals, S., Robinson, T., Speaker adaptation for hybrid HMM/ANN continuous speech recognition system, In Proceedings of the Eurospeech’95, pp. 2171-2174, 1995.
Bengio, Y., Frasconi, P., Input-output HMM’s for sequence processing, In IEEE Transactions on Neural Networks, vol. 7, no. 5, pp. 1231-1249, 1996.
Setlur, A.R., Sukkar R.A., Jacob J., Correcting recognition errors via discrim-inative utterance verification, In Proceedings of ICSLP’96, Philadelphia, USA, vol. 2, pp. 602-605, 1996.
Ganchev, T., Tasoulis, D.K., Vrahatis, M.N., Fakotakis, N., Locally recur- rent probabilistic neural network for text-independent speaker verification, In Proceedings of the Eurospeech’03, Geneva, Switzerland, vol. 3, pp. 1673-1676, September 1-4, 2003.
Ganchev, T., Tasoulis, D.K., Vrahatis, M.N., Fakotakis, N., Generalized lo- cally recurrent probabilistic neural networks for text-independent speaker verification, In Proceedings of the ICASSP’04, Montreal, Quebec, Canada, vol. 1, pp. 41-44, May 17-21, 2004.
Ganchev, T., Tasoulis, D.K., Vrahatis, M.N., Fakotakis N., Generalized locally recurrent probabilistic neural networks with application to text-independent speaker verification, In Neurocomputing, vol. 70, no. 7-9, pp. 1424-1438, 2007.
Liu, M., Wan, C., A study on content-based classification and retrieval of audio database, In International Database Engineering and Applications Symposium (IDEAS ’01), ISSN:1098-8068, p. 339, 2001.
Guo, X., Yan, Y., Xiao, Y.S., Xiao, S.-C., Heart sound recognition algorithm based on pnn for evaluating cardiac contractility change trend, In Journal of Biomedical Engineering, vol. 23, no. 5, 2006.
Barry, S.J., Dane1, A.D., Morice, A.H., Walmsley, A.D., The automatic recog- nition and counting of cough, In Cough, vol. 2, no. 8, 2006.
Chordia, P., Segmentation and recognition of tabla strokes, In Proceedings of the 6th International Conference on Music Information Retrieval, London, UK, 11-15 September, 2005.
Bolat, B., Kucuk, U., Musical sound recognition by active learning PNN, In Lecture Notes in Computer Science, vol. 4105/2006, Multimedia Content Representation, Classification and Security, ISSN:0302-9743, Springer, Berlin Heidelberg New York, 2006.
Kraft, F., Malkin, R., Schaaf, T., Waibel, A., Temporal ICA for classification of acoustic events in a kitchen environment, In Proceedings of the Interspeech’05, Lisbon, Portugal, 2005.
Ravindran, S., Anderson, D.V., Audio classification and scene recognition for hearing aids, In IEEE International Symposium on Circuits and Systems, ISCAS’05, vol. 2, pp. 860-863, 2005.
Temko, A., Nadeu, C., Classification of acoustic events using SVM-based clustering schemes, In Pattern Recognition, ISSN:0031-3203, vol. 39, no. 4, pp. 682-694, April 2006.
Dufaux, A., Besacier, L., Ansorge, M., Pellandini, F., Automatic sound detec- tion and recognition for noisy environment, In Proceedings of the EUSIPCO 2000, Tampere, Finland, 2000.
Yella, S., Gupta, N.K., Dougherty, M., Pattern recognition approach for the automatic classification of data from impact acoustics, In Proceedings of the AISC’2006, Palma De Mallorca, Spain, pp. 144-149, August 28-30, 2006.
Chu, S., Narayanan, S., Jay Kuo, C.-C., Matarić, M.J., Where am I? Scene recognition for mobile robots using audio features, In Proceedings of the ICME’06, pp. 885-888, 2006.
Essid, S., Classification of audio signals: machine recognition of musical instru- ments, Seminars, CNRS-LTCI, 2006.
Casey, M., General sound classification and similarity in MPEG-7, In Organised Sound, vol. 6, no. 2, pp. 153-164, 2001.
Casey, M., MPEG-7 sound recognition tools, In IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 6, pp. 737-747, 2001.
Peltonen, V., Tuomi, J., Klapuri, A., Huopaniemi, J., Sorsa, T., Computational auditory scene recognition, In Proceedings of the ICASSP’02, vol. 2, pp. 1941-1944,2002.
Sitte, R., Willets, L., Non-speech environmental sound identification for surveil- lance using self-organizing-maps, In Proceeding of the SPPRA 2007, Innsbruck, Austria, February 14-16, 2007.
Harlow, C., Wang, Y., Acoustic accident detection system, In Journal In- telligent Transportation Systems, Taylor & Francis, ISSN:1024-8072, vol. 7, pp. 43-56, January 2002.
Yella, S., Gupta, N.K., Dougherty, M., Condition monitoring using pattern recognition techniques on data from acoustic emissions, In Proceedings of the ICMLA’06, pp. 3-9, 2006.
Toyoda, Y., Huang, J., Ding, S., Liu, Y., Environmental sound recognition by the instantaneous spectrum combined with the time pattern of power, In Proceedings of the 2nd IASTED International Conference on Neural Networks and Computational Intelligence, NCI 2004, pp. 169-172, 2004.
Coath, M., Denham, S.L., Robust sound classification through the representa-tion of similarity using response fields derived from stimuli during early expe-rience, In Biological Cybernetics, vol. 93, no. 1, pp. 22-30, July, 2005.
Li, Y., Dorai, C., SVM-based audio classification for instructional video analy-sis, In Proceedings of the ICASSP’04, Montreal, Canada, vol. 5, pp. 897-900,2004.
Lin, C.-C., Chen, S.-H., Truong, T.-K., Chang, Y., Audio classification and categorization based on wavelets and support vector machine, In IEEE Trans-actions on Speech and Audio Processing, vol. 13, no. 5, September 2005.
Chen, L., Gunduz, S., Ozsu, M.T., Mixed type audio classification with support vector machine, In IEEE International Conference on Multimedia and Expo, ICME’06, pp. 781-784, July 2006.
McLachlan, G.J., Krishnan, T., The EM algorithm and extensions, Wiley Se- ries in Probability and Statistics, New York, Wiley, 1997.
Hartigan, J.A., Wong, M.A., A k-means clustering algorithm, In Applied Sta- tistics, vol. 28, no. 1, pp. 100-108, 1979.
Meisel, W., Computer-Oriented Approaches To Pattern Recognition, Academic Press, New York, 1972.
Cain, B.J., Improved probabilistic neural network and its performance relative to the other models, In Proceedings of the SPIE, Applications of Artificial Neural Networks, vol. 1294, pp. 354-365, 1990.
Musavi, M., Kalantri, K., Ahmed, W., Improving the performance of proba- bilistic neural networks, In Proceedings of IEEE International Joint Conference on Neural Networks, Baltimore, MD, USA, vol. 1, pp. 595-600, June 7-11, 1992.
Abe, S., Support Vector Machines for Pattern Classification, Springer, Berlin Heidelberg New York, London, 2005.
Hansen, L.K., Salamon, P., Neural Network Ensembles, In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993-1001, October 1990.
Ho, T.K., Hull, J.J., Srihari, S.N., Decision combination in multiple classifier systems, In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66-75, January 1994.
Breiman, L., Bagging predictors, In Machine Learning, vol. 24, pp. 123-140, 1996.
Dietterich, T., An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, In Machine Learning, pp. 1-22, 1998.
Kittler, J., Hatef, M., Duin, R., Matas, J., On combining classifiers, In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226-239, 1998.
Alkoot, F.M., Kittler, J., Experimental evaluation of expert fusion strategies, In Pattern Recognition Letters, vol. 20, no. 11, pp. 11-13, 1999.
Kittler, J., Alkoot, F.M., Sum versus vote fusion in multiple classifier systems, In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 1, pp. 110-115, 2003.
Xu, L., Krzyzak, A., Suen, C.Y., Methods of combining multiple classifiers and their applications to handwriting recognition, In IEEE Transactions on Systems, Man, and Cybernetics, vol. 22, no. 3, pp. 418-435, 1992.
Jordan, M.I., Jacobs, R.A., Hierarchical mixtures of experts and the EM algo-rithm, In Neural Computation, no. 6, pp. 181-214, 1994.
Hinton, G.E., Sallans, B., Ghahramani, Z., A Hierarchical Community of Experts, In Jordan, M.I.(Ed.), Learning in Graphical Models, Kluwer, pp. 479-494, 1998.
Dietterich, T., Ensamble Methods in Machine Learning, In Kittler, J., Rolli, F. (Eds.), Multiple Classifier Systems, pp. 1-15, 2000.
Ganchev, T., Tsopanoglou, A., Fakotakis, N., Kokkinakis, G., Probabilistic neural networks combined with GMMs for speaker recognition over telephone channels, In Proceedings of the DSP2002, Santorini, Greece, vol. 2, pp. 1081-1084, July 1-3, 2002.
Potamitis, I., Ganchev, T., Fakotakis, N., Automatic acoustic identification of crickets and cicadas, In Proceedings of the ISSPA’07, February 12-15, 2007.
Bishop, C., Pattern Recognition and Machine Learning, Springer, Berlin Heidelberg New York, 2006.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Potamitis, I., Ganchev, T. (2008). Generalized Recognition of Sound Events: Approaches and Applications. In: Tsihrintzis, G.A., Jain, L.C. (eds) Multimedia Services in Intelligent Environments. Studies in Computational Intelligence, vol 120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78502-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-78502-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78491-3
Online ISBN: 978-3-540-78502-6
eBook Packages: EngineeringEngineering (R0)