Abstract
Acoustic Scene Classification (ASC) is defined as recognition and categorizing an audio signal that identifies the environment in which it has been produced. This work aims to develop a Deep Neural Network (DNN) based system to detect the real life environments by analyzing their sound data. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided. The evaluation of proposed model shows an accuracy of 82%, which is better than the baseline system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Virtanen, T., Ono, N., Bello, J.P., Glotin, H.: Introduction to the special section on sound scene and event analysis. In: Proceedings of IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017
Bregman, A.S.: Auditory Scene Analysis, International Encyclopedia of the Social and Behavioral Sciences. Pergamon (Elsevier), Amsterdam (1990)
Stowell, D., Giannoulis, D., Benetos, E., La-grange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)
Mesaros, A., Heittola, T., Virtanen, T.: TUT database for acoustic scene classification and sound event detection. In: 24th Acoustic Scene Classification Workshop 2016 European Signal Processing Conference (EUSIPCO) (2016)
Kong, Q., Sobieraj, I., Wang, W., Plumbley, M.D.: Deep neural network baseline for Dcase challenge 2016. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)
Xu, Y., Huang, Q., Wang, W., Plumbley, M.D.: Hierarchical learning for DNN-based acoustic scene classification. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)
Schroder, J., Moritz, N., Anemuller, J., Goetze, S., Kollmeier, B.: Classifier architectures for acoustic scenes and events: implications for DNNs, TDNNs, and perceptual features from DCASE 2016. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017
Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems, pp. 1096–1104 (2009)
Laffitte, P., Sodoyer, D., Tatkeu, C., Girin, L.: Deep neural networks for automatic detection of screams and shouted speech in subway trains. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, pp. 6460–6464, March 2016
Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., Sarti, A.: Scream and gunshot detection and localization for audio surveillance systems. In: IEEE International Conference on Advanced Video and Signal based Surveillance (2007)
Heittola, T., Mesaros, A., Eronen, A., Virtanen, T.: Context-dependent sound event detection. EURASIP J. Audio, Speech Music Process. 1, 1–13 (2013)
Cai, R., Lu, L., Hanjalic, A., Zhang, H., Cai, L.-H.: A flexible framework for key audio effects detection and auditory context inference. IEEE Trans. Audio, Speech Lang. Process. 14(3), 1026–1039 (2006)
Xu, M., Xu, C., Duan, L., Jin, J.S., Luo, S.: Audio keywords generation for sports video analysis. ACM Trans. Multimedia Comput. Commun. Appl. 4(2), 1–23 (2008)
Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T.S., Abad, A.: Detecting audio events for semantic video search. In: Interspeech, pp. 1151–1154 (2009)
Schroder, J., Wabnik, S., van Hengel, P.W.J., Gotze, S.: Detection and classification of acoustic events for in-home care. In: Wichert, R., Eberhardt, B. (eds) Ambient Assisted Living, pp. 181–195. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18167-2_13
Rakotomamonjy, A.: Supervised representation learning for audio scene classification. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, June 2017
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends R Signal Process. 7(34), 197–387 (2014)
Li, Y., Zhang, X., Jin, H., Li, X., Wang, Q., He, Q., Huang, A.: Using multi stream hierarchical deep neural network to extract deep audio feature for acoustic event detection. In: Multimedia Tools and Applications. Springer, Berlin (2017). https://doi.org/10.1007/s11042-016-4332-z
Li, J., Dai, W., Metze, F., Qu, S., Das, S.: A Comparison of Deep Learning Methods for Environmental Sound Detection (2017)
Patiyal, R., Rajan, P.: Acoustic scene classification using deep learning. In: IEEE Proceedings of the Detection and Classification of Acoustic Scenes and Events (DCASE 2016)
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectifier linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)
Candel, A., Lanford, J., LeDell, E., Parmar, V., Arora, A.: Deep Learning with H2O, by H2O.ai, c. (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sophiya, E., Jothilakshmi, S. (2018). Deep Learning Based Audio Scene Classification. In: Ganapathi, G., Subramaniam, A., Graña, M., Balusamy, S., Natarajan, R., Ramanathan, P. (eds) Computational Intelligence, Cyber Security and Computational Models. Models and Techniques for Intelligent Systems and Automation. ICC3 2017. Communications in Computer and Information Science, vol 844. Springer, Singapore. https://doi.org/10.1007/978-981-13-0716-4_9
Download citation
DOI: https://doi.org/10.1007/978-981-13-0716-4_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0715-7
Online ISBN: 978-981-13-0716-4
eBook Packages: Computer ScienceComputer Science (R0)