Skip to main content

Abstract

Acoustic Scene Classification (ASC) is defined as recognition and categorizing an audio signal that identifies the environment in which it has been produced. This work aims to develop a Deep Neural Network (DNN) based system to detect the real life environments by analyzing their sound data. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided. The evaluation of proposed model shows an accuracy of 82%, which is better than the baseline system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Virtanen, T., Ono, N., Bello, J.P., Glotin, H.: Introduction to the special section on sound scene and event analysis. In: Proceedings of IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017

    Google Scholar 

  2. Bregman, A.S.: Auditory Scene Analysis, International Encyclopedia of the Social and Behavioral Sciences. Pergamon (Elsevier), Amsterdam (1990)

    Google Scholar 

  3. Stowell, D., Giannoulis, D., Benetos, E., La-grange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)

    Google Scholar 

  4. Mesaros, A., Heittola, T., Virtanen, T.: TUT database for acoustic scene classification and sound event detection. In: 24th Acoustic Scene Classification Workshop 2016 European Signal Processing Conference (EUSIPCO) (2016)

    Google Scholar 

  5. Kong, Q., Sobieraj, I., Wang, W., Plumbley, M.D.: Deep neural network baseline for Dcase challenge 2016. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)

    Google Scholar 

  6. Xu, Y., Huang, Q., Wang, W., Plumbley, M.D.: Hierarchical learning for DNN-based acoustic scene classification. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)

    Google Scholar 

  7. Schroder, J., Moritz, N., Anemuller, J., Goetze, S., Kollmeier, B.: Classifier architectures for acoustic scenes and events: implications for DNNs, TDNNs, and perceptual features from DCASE 2016. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017

    Google Scholar 

  8. Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems, pp. 1096–1104 (2009)

    Google Scholar 

  9. Laffitte, P., Sodoyer, D., Tatkeu, C., Girin, L.: Deep neural networks for automatic detection of screams and shouted speech in subway trains. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, pp. 6460–6464, March 2016

    Google Scholar 

  10. Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and gunshot detection and localization for audio surveillance systems. In: IEEE International Conference on Advanced Video and Signal based Surveillance (2007)

    Google Scholar 

  11. Heittola, T., Mesaros, A., Eronen, A., Virtanen, T.: Context-dependent sound event detection. EURASIP J. Audio, Speech Music Process. 1, 1–13 (2013)

    Article  Google Scholar 

  12. Cai, R., Lu, L., Hanjalic, A., Zhang, H., Cai, L.-H.: A flexible framework for key audio effects detection and auditory context inference. IEEE Trans. Audio, Speech Lang. Process. 14(3), 1026–1039 (2006)

    Google Scholar 

  13. Xu, M., Xu, C., Duan, L., Jin, J.S., Luo, S.: Audio keywords generation for sports video analysis. ACM Trans. Multimedia Comput. Commun. Appl. 4(2), 1–23 (2008)

    Article  Google Scholar 

  14. Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T.S., Abad, A.: Detecting audio events for semantic video search. In: Interspeech, pp. 1151–1154 (2009)

    Google Scholar 

  15. Schroder, J., Wabnik, S., van Hengel, P.W.J., Gotze, S.: Detection and classification of acoustic events for in-home care. In: Wichert, R., Eberhardt, B. (eds) Ambient Assisted Living, pp. 181–195. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18167-2_13

  16. Rakotomamonjy, A.: Supervised representation learning for audio scene classification. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017

    Google Scholar 

  17. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    Google Scholar 

  18. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends R Signal Process. 7(34), 197–387 (2014)

    Article  MathSciNet  Google Scholar 

  19. Li, Y., Zhang, X., Jin, H., Li, X., Wang, Q., He, Q., Huang, A.: Using multi stream hierarchical deep neural network to extract deep audio feature for acoustic event detection. In: Multimedia Tools and Applications. Springer, Berlin (2017). https://doi.org/10.1007/s11042-016-4332-z

  20. Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A Comparison of Deep Learning Methods for Environmental Sound Detection (2017)

    Google Scholar 

  21. Patiyal, R., Rajan, P.: Acoustic scene classification using deep learning. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)

    Google Scholar 

  22. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectifier linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)

    Google Scholar 

  23. Candel, A., Lanford, J., LeDell, E., Parmar, V., Arora, A.: Deep Learning with H2O, by H2O.ai, c. (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Sophiya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sophiya, E., Jothilakshmi, S. (2018). Deep Learning Based Audio Scene Classification. In: Ganapathi, G., Subramaniam, A., Graña, M., Balusamy, S., Natarajan, R., Ramanathan, P. (eds) Computational Intelligence, Cyber Security and Computational Models. Models and Techniques for Intelligent Systems and Automation. ICC3 2017. Communications in Computer and Information Science, vol 844. Springer, Singapore. https://doi.org/10.1007/978-981-13-0716-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-0716-4_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-0715-7

  • Online ISBN: 978-981-13-0716-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics