Environmental Sound Recognition by Measuring Significant Changes in the Spectral Entropy

  • Jessica Beltrán-Márquez
  • Edgar Chávez
  • Jesús Favela
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7329)


Automatic identification of activities can be used to provide information to caregivers of persons with dementia for identifying assistance needs. Environmental audio provides significant and representative information of the context, making microphones a choice to identify activities automatically. However, in real situations, the audio captured by microphones comes from overlapping sound sources, making its identification a challenge for audio analysis and retrieval. In this paper we propose a succinct representation of the signal by measuring the multiband spectral entropy of the signal frame by frame, followed by a cosine transform and binary codification, we call this the Cosine Multi-Band Spectral Entropy Signature (CMBSES). To test our proposal, we created a database of a mix-up of triples from a collection of nine environmental sounds in four different signal-to-noise ratios (SNR). We codified both the original sounds and the triples and then searched all the original sounds in the mix-up collection. To establish a ground truth we also tested the same database with 48 people of assorted ages. Our feature extraction outperforms the state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and it also surpass humans in the experiment.


Hide Markov Model Gaussian Mixture Model Activity Recognition Independent Component Analysis Sound Event 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Rialle, V., Ollivet, C., Guigui, C., Hervé, C.: What do family caregivers of alzheimer’s disease patients desire in smart home technologies? CoRR abs/0904.0437 (2009)Google Scholar
  2. 2.
    Morris, M., Lundell, J., Dishman, E., Needham, B.: New Perspectives on Ubiquitous Computing from Ethnographic Study of Elders with Cognitive Decline. In: Dey, A.K., Schmidt, A., McCarthy, J.F. (eds.) UbiComp 2003. LNCS, vol. 2864, pp. 227–242. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Lane, N.D., Miluzzo, E., Lu, H., Peebles, D., Choudhury, T., Campbell, A.T.: A survey of mobile phone sensing. Comm. Mag. 48, 140–150 (2010)CrossRefGoogle Scholar
  4. 4.
    Potamitis, I., Ganchev, T.: Generalized recognition of sound events: Approaches and applications, pp. 41–79 (2008)Google Scholar
  5. 5.
    Wichern, G., Xue, J., Thornburg, H., Mechtley, B., Spanias, A.: Segmentation, indexing, and retrieval for environmental and natural sounds. Trans. Audio, Speech and Lang. Proc. 18, 688–707 (2010)CrossRefGoogle Scholar
  6. 6.
    Handte, M., Iqbal, U., Apolinarski, W., Marrón, P.J.: Challenges in ubiquitous context recognition with personal mobile devices. In: Proceedings of the 4th ACM International Workshop on Context-Awareness for Self-Managing Systems, CASEMANS 2010, pp. 6:40–6:45. ACM, New York (2010)Google Scholar
  7. 7.
    Niessen, M.E., van Maanen, L., Andringa, T.C.: Disambiguating sounds through context. In: Proceedings of the 2008 IEEE International Conference on Semantic Computing, pp. 88–95. IEEE Computer Society, Washington, DC (2008)CrossRefGoogle Scholar
  8. 8.
    Bronkhorst, A.W.: The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acta Acustica United with Acustica, 117–128 (January 2000)Google Scholar
  9. 9.
    Mitrovic, D., Zeppelzauer, M., Breiteneder, C.: Features for content-based audio retrieval. Advances in Computers 78, 71–150 (2010)CrossRefGoogle Scholar
  10. 10.
    Ward, J.A., Lukowicz, P., Troster, G., Starner, T.E.: Activity recognition of assembly tasks using body-worn microphones and accelerometers. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1553–1567 (2006)CrossRefGoogle Scholar
  11. 11.
    Min, C.H., Ince, N.F., Tewfik, A.H.: Number Eusipco. In: Early Morning Activity Detection Using Acoustics and Wearable Wireless Sensors (2008)Google Scholar
  12. 12.
    Kern, N., Schiele, B., Schmidt, A.: Recognizing context for annotating a live life recording. Personal Ubiquitous Comput. 11, 251–263 (2007)CrossRefGoogle Scholar
  13. 13.
    Ma, L., Milner, B., Smith, D.: Acoustic environment classification. ACM Trans. Speech Lang. Process. 3, 1–22 (2006)CrossRefGoogle Scholar
  14. 14.
    Chu, S., Narayanan, S., Kuo, C.C.J.: Environmental sound recognition with time-frequency audio features. Trans. Audio, Speech and Lang. Proc. 17, 1142–1158 (2009)CrossRefGoogle Scholar
  15. 15.
    Lu, H., Pan, W., Lane, N.D., Choudhury, T., Campbell, A.T.: Soundsense: scalable sound sensing for people-centric applications on mobile phones. In: Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services, MobiSys 2009, pp. 165–178. ACM, New York (2009)CrossRefGoogle Scholar
  16. 16.
    Heittola, T., Mesaros, A., Virtanen, T., Eronen, A.: Sound event detection in multi-source environments using source separation. In: Workshop on Machine Listening in Multisource Environments, pp. 36–40 (2011),
  17. 17.
    Camarena-Ibarrola, A., Chávez, E., Tellez, E.S.: Robust Radio Broadcast Monitoring Using a Multi-Band Spectral Entropy Signature. In: Bayro-Corrochano, E., Eklundh, J.-O. (eds.) CIARP 2009. LNCS, vol. 5856, pp. 587–594. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  18. 18.
    Sigurdsson, S., Petersen, K.B.,, T.L.S.: Mel frequency cepstral coefficients: An evaluation of robustness of mp3 encoded music. In: ISMIR, pp. 286–289 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jessica Beltrán-Márquez
    • 1
  • Edgar Chávez
    • 2
  • Jesús Favela
    • 1
  1. 1.CICESEMexico
  2. 2.Universidad MichoacanaMexico

Personalised recommendations