Deep Learning Based Audio Scene Classification

Sophiya, E.; Jothilakshmi, S.

doi:10.1007/978-981-13-0716-4_9

E. Sophiya¹⁴ &
S. Jothilakshmi¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 844))

Included in the following conference series:

International Conference on Computational Intelligence, Cyber Security, and Computational Models

558 Accesses
1 Citations

Abstract

Acoustic Scene Classification (ASC) is defined as recognition and categorizing an audio signal that identifies the environment in which it has been produced. This work aims to develop a Deep Neural Network (DNN) based system to detect the real life environments by analyzing their sound data. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided. The evaluation of proposed model shows an accuracy of 82%, which is better than the baseline system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Virtanen, T., Ono, N., Bello, J.P., Glotin, H.: Introduction to the special section on sound scene and event analysis. In: Proceedings of IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017
Google Scholar
Bregman, A.S.: Auditory Scene Analysis, International Encyclopedia of the Social and Behavioral Sciences. Pergamon (Elsevier), Amsterdam (1990)
Google Scholar
Stowell, D., Giannoulis, D., Benetos, E., La-grange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)
Google Scholar
Mesaros, A., Heittola, T., Virtanen, T.: TUT database for acoustic scene classification and sound event detection. In: 24th Acoustic Scene Classification Workshop 2016 European Signal Processing Conference (EUSIPCO) (2016)
Google Scholar
Kong, Q., Sobieraj, I., Wang, W., Plumbley, M.D.: Deep neural network baseline for Dcase challenge 2016. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)
Google Scholar
Xu, Y., Huang, Q., Wang, W., Plumbley, M.D.: Hierarchical learning for DNN-based acoustic scene classification. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)
Google Scholar
Schroder, J., Moritz, N., Anemuller, J., Goetze, S., Kollmeier, B.: Classifier architectures for acoustic scenes and events: implications for DNNs, TDNNs, and perceptual features from DCASE 2016. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017
Google Scholar
Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems, pp. 1096–1104 (2009)
Google Scholar
Laffitte, P., Sodoyer, D., Tatkeu, C., Girin, L.: Deep neural networks for automatic detection of screams and shouted speech in subway trains. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, pp. 6460–6464, March 2016
Google Scholar
Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and gunshot detection and localization for audio surveillance systems. In: IEEE International Conference on Advanced Video and Signal based Surveillance (2007)
Google Scholar
Heittola, T., Mesaros, A., Eronen, A., Virtanen, T.: Context-dependent sound event detection. EURASIP J. Audio, Speech Music Process. 1, 1–13 (2013)
Article Google Scholar
Cai, R., Lu, L., Hanjalic, A., Zhang, H., Cai, L.-H.: A flexible framework for key audio effects detection and auditory context inference. IEEE Trans. Audio, Speech Lang. Process. 14(3), 1026–1039 (2006)
Google Scholar
Xu, M., Xu, C., Duan, L., Jin, J.S., Luo, S.: Audio keywords generation for sports video analysis. ACM Trans. Multimedia Comput. Commun. Appl. 4(2), 1–23 (2008)
Article Google Scholar
Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T.S., Abad, A.: Detecting audio events for semantic video search. In: Interspeech, pp. 1151–1154 (2009)
Google Scholar
Schroder, J., Wabnik, S., van Hengel, P.W.J., Gotze, S.: Detection and classification of acoustic events for in-home care. In: Wichert, R., Eberhardt, B. (eds) Ambient Assisted Living, pp. 181–195. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18167-2_13
Rakotomamonjy, A.: Supervised representation learning for audio scene classification. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Google Scholar
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends R Signal Process. 7(34), 197–387 (2014)
Article MathSciNet Google Scholar
Li, Y., Zhang, X., Jin, H., Li, X., Wang, Q., He, Q., Huang, A.: Using multi stream hierarchical deep neural network to extract deep audio feature for acoustic event detection. In: Multimedia Tools and Applications. Springer, Berlin (2017). https://doi.org/10.1007/s11042-016-4332-z
Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A Comparison of Deep Learning Methods for Environmental Sound Detection (2017)
Google Scholar
Patiyal, R., Rajan, P.: Acoustic scene classification using deep learning. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)
Google Scholar
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectifier linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
Google Scholar
Candel, A., Lanford, J., LeDell, E., Parmar, V., Arora, A.: Deep Learning with H2O, by H2O.ai, c. (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Annamalai University, Annamalainagar, India
E. Sophiya
Department of Information Technology, Annamalai University, Annamalainagar, India
S. Jothilakshmi

Authors

E. Sophiya
View author publications
You can also search for this author in PubMed Google Scholar
S. Jothilakshmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to E. Sophiya .

Editor information

Editors and Affiliations

PSG College of Technology, Coimbatore, India
Geetha Ganapathi
BITS Pilani, KK Birla, Goa, India
Arumugam Subramaniam
University of the Basque Country, San Sebastian, Spain
Manuel Graña
PSG College of Technology, Coimbatore, India
Suresh Balusamy
PSG College of Technology, Coimbatore, India
Rajamanickam Natarajan
PSG College of Technology, Coimbatore, India
Periakaruppan Ramanathan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sophiya, E., Jothilakshmi, S. (2018). Deep Learning Based Audio Scene Classification. In: Ganapathi, G., Subramaniam, A., Graña, M., Balusamy, S., Natarajan, R., Ramanathan, P. (eds) Computational Intelligence, Cyber Security and Computational Models. Models and Techniques for Intelligent Systems and Automation. ICC3 2017. Communications in Computer and Information Science, vol 844. Springer, Singapore. https://doi.org/10.1007/978-981-13-0716-4_9

Download citation

DOI: https://doi.org/10.1007/978-981-13-0716-4_9
Published: 11 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0715-7
Online ISBN: 978-981-13-0716-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics