Skip to main content

Separation of Drum and Bass from Monaural Tracks

  • Chapter
  • First Online:

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 102))

Abstract

In this paper, we propose a deep recurrent neural network (DRNN), based on the Long Short-Term Memory (LSTM) unit, for the separation of drum and bass sources from a monaural audio track. In particular, a single DRNN with a total of six hidden layers (three feedforward and three recurrent) is used for each original source to be separated. In this work, we limit our attention to the case of only two, challenging sources: drum and bass. Some experimental results show the effectiveness of the proposed approach with respect to another state-of-the-art method. Results are expressed in terms of well-known metrics in the field of source separation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Available at: http://medleydb.weebly.com/.

References

  1. Asari, H., Olsson, R.K., Pearlmutter, B.A.: Sparsification for monaural source separation. In: Makino, S., Lee, T.W., Sawada, H. (eds.) Blind Speech Separation, Chap. 14, pp. 387–410. Springer (2007)

    Chapter  Google Scholar 

  2. Beierholm, T., Dam Pedersen, B., Winthert, O.: Low complexity bayesian single-channel source separation. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004) (2004)

    Google Scholar 

  3. Bittner, R., Salamon, J., Tierney, M., Mauch, M., Cannam, C., Bello, J.P.: MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: 15th International Society for Music Information Retrieval Conference, pp. 1–6. Taipei, Taiwan (2014)

    Google Scholar 

  4. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. Wiley (2002)

    Google Scholar 

  5. Comon, P., Jutten, C. (eds.): Handbook of Blind Source Separation. Springer (2010)

    Google Scholar 

  6. Gao, B., Woo, W.L., Dlay, S.S.: Single-channel source separation using EMD-subband variable regularized sparse features. IEEE Trans. Audio Speech Lang. Process. 19(4), 961–976 (2011)

    Article  Google Scholar 

  7. Grais, E.M., Sen, M.U., Erdogan, H.: Deep neural networks for single channel source separation. In: 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP 2014), pp. 1–5. Florence, Italy, 4–9 May 2014

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio, Speech Lang. Process. 23(12), 1–12 (2015)

    Article  Google Scholar 

  10. Jang, G.J., Lee, T.W.: A maximum likelihood approach to single-channel source separation. J. Mach. Learn. Res. 4(12), 1365–1392 (2003)

    MathSciNet  MATH  Google Scholar 

  11. Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  12. Litvin, Y., Cohen, I.: Source separation using Bark-scale wavelet packet decompostion. In: Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP2009), pp. 1–4. Grenoble, France, 1–4 Sept 2009

    Google Scholar 

  13. Molla, K., Hirose, K.: Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Trans. Audio Speech Lang. Process. 15(3), 893–900 (2004)

    Article  Google Scholar 

  14. Patki, K.: Review of single channel source separation techniques. In: Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), pp. 1–5. Curitiba, Brasil, 4–8 Nov 2013

    Google Scholar 

  15. Reddy, A.M., Raj, B.: Soft mask methods for single-channel speaker separation. IEEE Trans. Audio Speech Lang. Process. 15(6), 1766–1776 (2007)

    Article  Google Scholar 

  16. Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180, 19–22 Oct 2003

    Google Scholar 

  17. Tieleman, T., Hinton, G.: Lecture 6.5—RMSProp. Tech. rep., COURSERA: Neural Networks for Machine Learning (2012)

    Google Scholar 

  18. Uncini, A.: Fundamentals of adaptive signal processing. In: Signals and Communication Technology. Springer International Publishing, Switzerland (2015)

    Google Scholar 

  19. Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  20. Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)

    Article  Google Scholar 

  21. Weninger, F., Eyben, F., Schuller, B.: Single-channel speech separation with memory-enhanced recurrent neural networks. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), pp. 3709–3713. Florence, Italy, 4–9 May 2014

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michele Scarpiniti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Scarpiniti, M., Scardapane, S., Comminiello, D., Parisi, R., Uncini, A. (2019). Separation of Drum and Bass from Monaural Tracks. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Advances in Processing Nonlinear Dynamic Signals. WIRN 2017 2017. Smart Innovation, Systems and Technologies, vol 102. Springer, Cham. https://doi.org/10.1007/978-3-319-95098-3_13

Download citation

Publish with us

Policies and ethics