Audio Coding and Classification: Principles and Algorithms

  • Karthikeyan Umapathy
  • Sridhar Krishnan

Copyright Information 11, 22

19.1 Introduction

A normal human can hear sound vibrations in the range from 20 Hz to 20 kHz. Signals that create such audible vibrations qualify as an audio signal. Creating, modulating, and interpreting audio clues were among the foremost abilities that differentiated humans from the rest of the animal species. Over the years, methodical creation and processing of audio signals resulted in the development of different forms of communication, entertainment, and even biomedical diagnostic tools. With the advancements in the technology, audio processing was automated and various enhancements were introduced. The current digital era furthered the audio processing with the power of computers. Complex audio processing tasks were easily implemented and performed in blistering speeds. The digitally converted and formatted audio signals brought in high levels of noise immunity with guaranteed quality of reproduction over time. However, on the one hand, the...


Compression Ratio Audio Signal Critical Band Modify Discrete Cosine Transform Audio Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Mallat S (1998) A wavelet tour of signal processing. Academic press, San Diego, CA.MATHGoogle Scholar
  2. 2.
    Mallat S G and Zhang Z (1993) “Matching pursuit with time-frequency dictionaries”, IEEE Trans. Signal Processing, 41(12): 3397–3415.MATHCrossRefGoogle Scholar
  3. 3.
    Cohen L (1989) “Time-frequency Distributions — a review”, Proceedings of the IEEE, 77(7): 941–981.CrossRefGoogle Scholar
  4. 4.
    Painter T and Spanias A (2000) “Perceptual Coding of Digital Audio”, Proceedings of the IEEE, 88(4): 451–513.CrossRefGoogle Scholar
  5. 5.
    Black S M (1995) “Wavelet packet coding of wideband audio signals”, M.E.Sc Thesis, University of Western Ontario.Google Scholar
  6. 6.
    Moore B C J (1992) An Introduction to the Psychology of Hearing. Academic Press., Toronto, ON.Google Scholar
  7. 7.
    Scharf B (1970) Critical bands — Foundations of Modern Auditory Theory, vol. 1. Academic press, New York, NY.Google Scholar
  8. 8.
    Goodwin M M (1998) Adaptive Signal Models: Theory, Algorithms and Audio Applications. Kluwer Academic Publishers, Norwell, MA.Google Scholar
  9. 9.
    Brandenburg K and Bosi M (1997) “MPEG-2 Advanced Audio Coding: Overview and Applications”, 103rd Audio Engineering Society Convention, New York, Preprint 4641.Google Scholar
  10. 10.
  11. 11.
    Eberlein E et al. (1993) “Layer-3, a flexible coding standard”, 94th Audio Engineering Society Convention, Berlin, Preprint 3493.Google Scholar
  12. 12.
    Herre J et al. (1995) “Second generation ISO/MPEG audio layer-3 coding”, 98th Audio Engineering Society Convention, Paris.Google Scholar
  13. 13.
    JTC1/SC29/WG11 I (2002) “Overview of the MPEG-4 Standard”, International Organization for Standardization.Google Scholar
  14. 14.
  15. 15.
    Meltzer S and Moser G (2006) MPEG-4 HE-AAC v2 audio coding for todays digital media world, EBU Technical Review.Google Scholar
  16. 16.
    Orfanidis S J (1996) Introduction to Signal Processing. Prentice-hall, New Jersey, NJ.Google Scholar
  17. 17.
    Ryden T (1996) “Using Listening Tests to Assess Audio Codecs”, Collected Papers on Digital Audio Bit-Rate Reduction, AES, 115–125.Google Scholar
  18. 18.
    Campbell-Jr. J P (1997) “Speaker recognition: a tutorial”, Proceedings of the IEEE, 85(9): 1437–1462.CrossRefGoogle Scholar
  19. 19.
    Lu L and Zhang H-J (2002) “Content Analysis for Audio Classification and Segmentation”, IEEE Transactions on Speech and Audio Processing, 10(7): 504–516.CrossRefGoogle Scholar
  20. 20.
    Umapathy K, Krishnan S and Jimaa S (2005) “Multi-group classification of audio signals using time-frequency parameters”, IEEE Transactions on Multimedia, 7(2): 308–315.CrossRefGoogle Scholar
  21. 21.
    Guo G and Li S Z (2003) “Content-based audio classification and retrieval by support vector machines”, IEEE Transactions on neural networks, 14(1): 209–215.CrossRefGoogle Scholar
  22. 22.
    Tzanetakis G and Cook P (2002) “Music Genre Classification of Audio Signals”, IEEE Transactions on Speech and Audio Processing, 10(5): 293–302.CrossRefGoogle Scholar
  23. 23.
    Burges C J, Platt J C and Jana S (2003) “Distortion discriminant analysis for audio fingerprinting”, IEEE Transactions on Speech and Audio Processing, 11(3): 165–174.CrossRefGoogle Scholar
  24. 24.
    Dugelay J L et al. (2002) “Recent advances in biometric person authentication”, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP'02)., 4: 4060–4063.Google Scholar
  25. 25.
    Cooper M and Foote J (2003) “Summarizing popular music via structural similarity analysis”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2003, 127–130.Google Scholar
  26. 26.
    Xu C, Maddage N C and Shao X (2005) “Automatic music classification and summarization”, IEEE Transactions on Speech and Audio Processing, 13(3): 441–450.CrossRefGoogle Scholar
  27. 27.
    Kim H G, Moreau N and Sikora T (2004) “Audio classification based on MPEG-7 spectral basis representations”, IEEE Transactions on circuits and systems for video technology, 14(5): 716–725.CrossRefGoogle Scholar
  28. 28.
    Esmaili S, Krishnan S and Raahemifar K (2004) “Content based audio classification and retrieval using joint time-frequency analysis”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), V: 665–668.Google Scholar
  29. 29.
    Soltau H, Schultz T, Westphal M and Waibel A (1998) “Recognition of music type”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 1137–1140.Google Scholar
  30. 30.
    Allamanche E et al. (2001) “Content-based Identification of Audio Material Using MPEG-7 Low Level Description”, Proc. 2nd Annual International Symposium on Music Information Retrieval 2001, 197–204.Google Scholar
  31. 31.
    Inc. S (1990) “SPSS Advanced Statistics User's Guide”, User manual, SPSS Inc., Chicago, IL.Google Scholar
  32. 32.
    Fukunaga K (1990) Introduction to Statistical Pattern Recognition. Academic Press, Inc., San Diego, CA.MATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Karthikeyan Umapathy
    • 1
  • Sridhar Krishnan
    • 1
  1. 1.Ryerson UniversityCanada

Personalised recommendations