Context Extraction Through Audio Signal Analysis

  • Paris Smaragdis
  • Regunathan Radhakrishnan
  • Kevin W. Wilson
Part of the Signals and Communication Technology book series (SCT)


A lot of multimedia content comes with a soundtrack which is often not taken advantage of by content analysis applications. In this chapter we cover some of the essential techniques for performing context analysis from audio signals. We describe the most popular approaches in representing audio signals, learning their structure and constructing classifiers that recognize specific sounds, as well as algorithms for locating where sounds are coming from. All these tools when used in the context of content analysis can provide powerful descriptors that can help us find various events which would be hard to locate otherwise.


Hide Markov Model Discrete Cosine Transform Gaussian Mixture Model Audio Signal Precedence Effect 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bilmes, J. A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report, University of Berkeley, ICSI-TR-97-021, 1997.Google Scholar
  2. 2.
    Forney, G. D. The Viterbi algorithm. Proceedings of the IEEE 61(3):268278, March 1973.MathSciNetCrossRefGoogle Scholar
  3. 3.
    Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition . Proceedings of the IEEE 77(2):257286, February 1989.Google Scholar
  4. 4.
    Toyoda, Y., Huang, J., Ding, S., and Liu, Y. 2004. Environmental Sound Recognition by Multilayered Neural Networks. In Proceedings of the the Fourth international Conference on Computer and information Technology (September 14 - 16, 2004). CIT. IEEE Computer Society, Washington, DC, 123-127.Google Scholar
  5. 5.
    Rabaoui, A., Davy, M., Rossignol, S., and Lachiri, Z. and Ellouze, N., Improved one-class SVM classifier for sounds classification, AVSBS07, 2007.Google Scholar
  6. 6.
    Boufounos, P. Signal Processing for DNA Sequencing, Masters Thesis, Massachusetts Institute of Technology, June 2002.Google Scholar
  7. 7.
    Krogh, A. and Riis, S. K. Hidden Neural Networks, in Neural Computation, Vol. 11.2, 1999.Google Scholar
  8. 8.
    Duda, R.O. Hart, P.E. and Stork, D.G. Pattern Classification (2nd edition). Wiley. 2001Google Scholar
  9. 9.
    Bishop, C.M. Pattern Recognition and Machine Learning, Springer. 2006Google Scholar
  10. 10.
    Berouti, M. Schwartz, R. Makhoul, J. Enhancement of speech corrupted by acoustic noise, in Acoustics, Speech, and Signal Processing, 1979.Google Scholar
  11. 11.
    Plumbley, M. Abdallah, S. and J. Bello and M. Davies and J. Klingseisen and G. Monti and M. Sandler, ICA and related models applied to audio analysis and separation, In Proc. 4th Int. ICSC Symposium on Soft Computing and Intelligent Systems for Industry, Paisley, Scotland, 2001Google Scholar
  12. 12.
    Yilmaz, O. and Rickard, S. Blind Separation of Speech Mixtures via Time-Frequency Masking, IEEE Transactions on Signal Processing, Vol. 52, No. 7, pages 1830-1847, July 2004.Google Scholar
  13. 13.
    Wang D., Brown, G. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley. 2006Google Scholar
  14. 14.
    Eronen, A. Peltonen, V. Tuomi, J. Klapuri, A., Fagerlund, S., Sorsa, T., Lorho, G., and Huopaniemi, J., “Audio-Based Context Recognition,” IEEE Trans. Audio, Speech and Language Processing, 14(1), 2006.Google Scholar
  15. 15.
    Scheirer, E. and Slaney, M. “Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator ,” IEEE Proc. ICASSP, 14(1), 1997.Google Scholar
  16. 16.
    Allen, J.B., Berkley, D.A. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America 65(4), 943–950 (1979).CrossRefGoogle Scholar
  17. 17.
    Blauert, J. Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press (1997)Google Scholar
  18. 18.
    Golub, G.H., Van Loan, C.F. Matrix Computations. 3rd edn. Johns Hopkins University Press (1996)Google Scholar
  19. 19.
    Haykin, S., Chen, Z. The cocktail party problem. Neural Computation 17(9), 1875–1902 (2005).CrossRefGoogle Scholar
  20. 20.
    Knapp, C.H., Carter, G.C. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 24(4), 320–327 (1976)CrossRefGoogle Scholar
  21. 21.
    Litovsky, R.Y., Colburn, H.S., Yost, W.A., Guzman, S.J. The precedence effect. The Journal of the Acoustical Society of America 106(4), 1633–1654 (1999).CrossRefGoogle Scholar
  22. 22.
    Rakerd, B., Hartmann, W.M. Localization of sound in rooms, iii: Onset and duration effects. The Journal of the Acoustical Society of America 80(6), 1695–1706 (1986).CrossRefGoogle Scholar
  23. 23.
    Stecker, G.C. Observer weighting in sound localization. Ph.D. thesis, University of California at Berkeley (2000)Google Scholar
  24. 24.
    Wilson, K., Darrell, T. Learning a precedence effect-like weighting function for the generalized cross-correlation framework. IEEE Transactions on Audio, Speech, and Language Processing (2006 (to appear))Google Scholar
  25. 25.
    Wilson, K.W. Estimating uncertainty models for speech source localization in real-world environments. Ph.D. thesis, Massachusetts Institute of Technology (2006)Google Scholar
  26. 26.
    Zurek, P.M. The precedence effect and its possible role in the avoidance of interaural ambiguities. Journal of the Acoustical Society of America 67(3) (1980)Google Scholar
  27. 27.
    Zurek, P.M. The precedence effect. In: Yost, W.A., Gourevitch, G. (eds.) Directional Hearing. Springer-Verlag (1987)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Paris Smaragdis
    • 1
  • Regunathan Radhakrishnan
  • Kevin W. Wilson
  1. 1.Adobe Systems IncNewton 02466USA

Personalised recommendations