Skip to main content

An RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Abstract

Hybrid coders are able to select coding mode for different types of audio data and obtain high coding efficiency in a universal scheme with low bitrate. The selection accuracy is critical to maintain the quality of decoded audio signal and the computational complexity of hybrid coder is also important due to various applications on mobile devices. In this paper, a low complexity coding mode selection method based on Recurrent Neural Networks (RNN) has been investigated to improve the coding performance of the state-of-the-art hybrid coder, AMR-WB+. A constraint is composed on the outputs of RNN by sigmoid function to improve SNR of the decoded audio signal. The experimental results show that the proposed method achieves almost similar quality of decoded audio signal with closed-loop method and comparable complexity with open-loop method in AMR-WB+ coder, outperforming some latest classification methods for this standard.

W. Tu—This work is supported by National Nature Science Foundation of China (No. 61671335).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Wang, J., Wu, Q., Deng, H.: Real-time speech/music classification with a hierarchical oblique decision tree. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2033–2036 (2008)

    Google Scholar 

  2. Fuchs, G.: A robust speech/music discriminator for switched audio coding. In: 23rd European Signal Processing Conference, pp. 569–573 (2015)

    Google Scholar 

  3. Kim, J., Kim, N.: Improved frame mode selection for AMR-WB+ based on decision tree. IEICE Trans. Inf. Syst. 91(6), 1830–1833 (2008)

    Article  Google Scholar 

  4. Wang, M., Lee, M.: A neural network based coding mode selection scheme of hybrid audio coder. In: IEEE International Conference on Wireless Communications, Networking and Information Security, pp. 107–110 (2010)

    Google Scholar 

  5. Khan, M., Al-Khatib, W., Moinuddin, M.: Automatic classification of speech and music using neural networks. In: ACM International Workshop on Multimedia databases, pp. 94–99 (2004)

    Google Scholar 

  6. Pikrakis, A., Theodoridis, S.: Speech-music discrimination: a deep learning perspective. In: 22nd European Signal Processing Conference, pp. 616–620 (2014)

    Google Scholar 

  7. 3GPP TR 26.936: 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Performance Characterization of Audio Codecs (Release 14) (2017)

    Google Scholar 

  8. Srivastava, N., Hinton, G., Krizhevsky, A.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  9. ETSI TS 126 290: Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTS); Audio codec processing functions; Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec; Transcoding functions (Release 6) (2005)

    Google Scholar 

  10. ISO/IEC 23003-3, Information Technology – MPEG Audio Technologies – Part 3: Unified Speech and Audio Coding, ed. 1, International Organization for Standardization (2011)

    Google Scholar 

  11. GPP TS 26.441: 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Codec for Enhanced Voice Services (EVS); General Overview (Release 14) (2017)

    Google Scholar 

  12. Lee, S., Kim, J., Lee, I.: Speech/audio signal classification using spectral flux pattern recogniton. In: IEEE Workshop on Signal Processing Systems, pp. 232–236 (2012)

    Google Scholar 

  13. Khonglah, B., Sharma, R., Mahadeva, S.: Speech vs music discrimination using empirical mode decomposition. In: National Conference on Communications, pp. 1–6 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiping Tu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, W., Tu, W., Zheng, J., Zhang, X., Yang, Y., Song, Y. (2018). An RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73603-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73602-0

  • Online ISBN: 978-3-319-73603-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics