Skip to main content

Sound Recognition

  • Chapter
Computing with Instinct

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5897))

Abstract

Sound recognition has been a primitive survival instinct of early mammals for over 120 million years. In the modern era, it is the most affordable sensory channel for us. Here we explore an auditory vigilance algorithm for detecting background sounds such as explosion, gunshot, screaming, and human voice. We introduce a general algorithm for sound feature extraction, classification and feedback. We use Hamming window for tapering sound signals and the short-term Fourier transform (STFT) and Principal Component Analysis (PCA) for feature extraction. We then apply a Gaussian Mixture Model (GMM) for classification; and we use the feedback from the confusion matrix of the training classifier to redefine the sound classes for better representation, accuracy and compression. We found that the frequency coefficients in a logarithmic scale yield better results versus those in linear representations in background sound recognition. However, the magnitude of the sound samples in a logarithmic scale yields worse results versus those in linear representations. We also compare our results to that of the linear frequency model and the Mel-scale Frequency Cepstral Coefficients (MFCC)-based algorithms. We conclude that our algorithm reaches a higher accuracy with available training data. We foresee broader applications of the sound recognition method, including video triage, healthcare, robotics and security.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Logan, B., et al.: Mel Frequency Cepstral Coefficients for Music Modelling. Cambridge Research Laboratory (2000)

    Google Scholar 

  2. Lindsay, I., Smith, A.: Tutorial on Principal Components Analysis (2002)

    Google Scholar 

  3. Shannon, B.J., Paliwal, K.K.: A Comparative Study of Filter Bank Spacing for Speech Recognition. In: Microelectronic Engineering Research Conference (2003)

    Google Scholar 

  4. Sigurdsson, S., et al.: Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music, Technical University of Denmark (2006)

    Google Scholar 

  5. Breebaart, J., McKinney, M.: Features for Audio Classification, Philips Research Laboratories (2008)

    Google Scholar 

  6. Spina, M.S., Zue, V.W.: Automatic transcription of general audio data: Preliminary analysis. In: Proc. 4th Int. Conf. on Spoken Language Processing, Philadelphia, PA (1997)

    Google Scholar 

  7. Dellaert, F.: The Expectation Maximization Algorithm, College of Computing, Georgia Institute of Technology (2002)

    Google Scholar 

  8. Hsieh, P.-F., Landgrebe, D.: Classification of High Dimensional Data, Purdue University School of Electrical and Computer Engineering, ECE Technical Reports (1998)

    Google Scholar 

  9. Jensen, J.H., et al.: Evaluation of MFCC Estimation Techniques for Music Similarity

    Google Scholar 

  10. Shlens, J.: A Tutorial on Principal Component Analysis: Derivation, Discussion and Singular Value Decomposition (2003)

    Google Scholar 

  11. Bengio, S.: An Introduction to Statistical Machine Learning - EM for GMMs -, Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) slides (2004)

    Google Scholar 

  12. Li, J.: Mixture Models, Department of Statistics slides, The Pennsylvania State University (2008)

    Google Scholar 

  13. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Processing 10, 293–301 (2002)

    Article  Google Scholar 

  14. Pfeiffer, S., Fischer, S., Effelsberg, W.: Automatic audio content analysis. Tech. Rep. No. 96-008, University of Mannheim (1996)

    Google Scholar 

  15. Foote, J.: Content-based retrieval of music and audio. Multimedia Storage and Archiving Systems II, 138-147 (1997)

    Google Scholar 

  16. Bouman, C.A.: CLUSTER: An Unsupervised Algorithm for Modeling Gaussian Mixtures, School of Electrical Engineering, Purdue University (2005)

    Google Scholar 

  17. Window function, http://en.wikipedia.org/wiki/Window_function (retrieved on 07/30/2010)

  18. Spectrogram, http://en.wikipedia.org/wiki/Spectrogram (retrieved on 07/30/2010)

  19. Siegel, M., et al.: Vehicle Sound Signature Recognition by Frequency Vector Principal Component Analysis. IEEE Trans. on Instrumentation And Measurement 48(5) (October 1999)

    Google Scholar 

  20. Spiteri, M.A., Cook, D.G., Clark, S.W.: Reliability of eliciting physical signs in examination of the chest. Lancet. 2, 873–875 (1988)

    Article  Google Scholar 

  21. Pasterkamp, H., Kraman, S.S., Wodicka, G.R.: Respiratory sounds:advances beyond the stethoscope. American Journal of Respiratory Critical Care Medicine 156, 974–987 (1997)

    Article  Google Scholar 

  22. Anderson, K., Qiu, Y., Whittaker, A.R., Lucas, M.: Breath sounds, asthma, and the mobile phone. Lancet. 358, 1343–1344 (2001)

    Article  Google Scholar 

  23. Cai, Y., Abascal, J.: Ambient Intelligence in Everyday Life. LNCS (LNAI), vol. 3864. Springer, Heidelberg (2006)

    Google Scholar 

  24. Peter, G., Cukierman, D., Anthony, C., Schwartz, M.: Online music search by tapping. In: Cai, Y., Abascal, J. (eds.) Ambient Intelligence in Everyday Life. LNCS (LNAI), vol. 3864, pp. 178–197. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  25. Liao, W.-H., Lin, Y.-K.: Classification of Non-Speech Human Sounds: Feature Selection and Snoring Sound Analysis. In: Proc. of the 2009 IEEE Int. Conf. on Systems, Man and Cybernetics (2009)

    Google Scholar 

  26. Chu, W., Champagne, B.: A Noise-Robust FFT-Based Spectrum for Audio Classification, Department of Electrical and Computer Engineering. McGill University, Montreal (2006)

    Google Scholar 

  27. TIMIT Acoustic-Phonetic Continuous Speech Corpus, Linguistic Data Consortium, University of Pennsylvania, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1

  28. FFmpeg, http://ffmpeg.org

  29. Smith, S.W.: The Scientist & Engineer’s Guide to Digital Signal Processing. California Technical Pub. (1997) ISBN 0966017633

    Google Scholar 

  30. Hearing Central LLC, How the Human Ear Works, http://www.hearingaidscentral.com/howtheearworks.asp (retrieved on 10/25/2010)

  31. Lee, H., et al.: Unsupervised feature learning for audio classification using convolutional deep belief networks. Stanford University, Stanford (2009)

    Google Scholar 

  32. Forero Mendoza, L.A., Cataldo, E., Vellasco, M., Silva, M.: Classification of Voice Aging Using Parameters Extracted from the Glottal Signal. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010. LNCS, vol. 6354, pp. 149–156. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  33. Angier, N.: In Mammals, a Complex Journey. New York Times (October 13, 2009)

    Google Scholar 

  34. Ji, Q., Luo, Z.X., Zhang, X.L., Yuan, C.X., Xu, L.: Evolutionary Development of the Middle Ear in Mesozoic Therian Mammals. Science 9 326(5950), 278–281 (2009)

    Google Scholar 

  35. Martin, T., Ruf, I.: On the Mammalian Ear. Science 326(5950), 243–244 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cai, Y., Pados, K.D. (2011). Sound Recognition. In: Cai, Y. (eds) Computing with Instinct. Lecture Notes in Computer Science(), vol 5897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19757-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19757-4_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19756-7

  • Online ISBN: 978-3-642-19757-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics