Abstract
Hybrid coders are able to select coding mode for different types of audio data and obtain high coding efficiency in a universal scheme with low bitrate. The selection accuracy is critical to maintain the quality of decoded audio signal and the computational complexity of hybrid coder is also important due to various applications on mobile devices. In this paper, a low complexity coding mode selection method based on Recurrent Neural Networks (RNN) has been investigated to improve the coding performance of the state-of-the-art hybrid coder, AMR-WB+. A constraint is composed on the outputs of RNN by sigmoid function to improve SNR of the decoded audio signal. The experimental results show that the proposed method achieves almost similar quality of decoded audio signal with closed-loop method and comparable complexity with open-loop method in AMR-WB+ coder, outperforming some latest classification methods for this standard.
W. Tu—This work is supported by National Nature Science Foundation of China (No. 61671335).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wang, J., Wu, Q., Deng, H.: Real-time speech/music classification with a hierarchical oblique decision tree. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2033–2036 (2008)
Fuchs, G.: A robust speech/music discriminator for switched audio coding. In: 23rd European Signal Processing Conference, pp. 569–573 (2015)
Kim, J., Kim, N.: Improved frame mode selection for AMR-WB+ based on decision tree. IEICE Trans. Inf. Syst. 91(6), 1830–1833 (2008)
Wang, M., Lee, M.: A neural network based coding mode selection scheme of hybrid audio coder. In: IEEE International Conference on Wireless Communications, Networking and Information Security, pp. 107–110 (2010)
Khan, M., Al-Khatib, W., Moinuddin, M.: Automatic classification of speech and music using neural networks. In: ACM International Workshop on Multimedia databases, pp. 94–99 (2004)
Pikrakis, A., Theodoridis, S.: Speech-music discrimination: a deep learning perspective. In: 22nd European Signal Processing Conference, pp. 616–620 (2014)
3GPP TR 26.936: 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Performance Characterization of Audio Codecs (Release 14) (2017)
Srivastava, N., Hinton, G., Krizhevsky, A.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
ETSI TS 126 290: Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); Audio codec processing functions; Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec; Transcoding functions (Release 6) (2005)
ISO/IEC 23003-3, Information Technology – MPEG Audio Technologies – Part 3: Unified Speech and Audio Coding, ed. 1, International Organization for Standardization (2011)
GPP TS 26.441: 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Codec for Enhanced Voice Services (EVS); General Overview (Release 14) (2017)
Lee, S., Kim, J., Lee, I.: Speech/audio signal classification using spectral flux pattern recogniton. In: IEEE Workshop on Signal Processing Systems, pp. 232–236 (2012)
Khonglah, B., Sharma, R., Mahadeva, S.: Speech vs music discrimination using empirical mode decomposition. In: National Conference on Communications, pp. 1–6 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Yang, W., Tu, W., Zheng, J., Zhang, X., Yang, Y., Song, Y. (2018). An RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-73603-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)