Skip to main content

An Integrated Processing Method Based on Wasserstein Barycenter Algorithm for Automatic Music Transcription

  • Conference paper
  • First Online:
Book cover Communications and Networking (ChinaCom 2019)

Abstract

Given a piece of acoustic musical signal, various automatic music transcription (AMT) processing methods have been proposed to generate the corresponding music notations without human intervention. However, the existing AMT methods based on signal processing or machine learning cannot perfectly restore the original music signal and have significant distortion. In this paper, we propose a novel processing method which integrates various AMT methods so as to achieve better performance on music transcription. This integrated method is based on the entropic regularized Wasserstein Barycenter algorithm to speed up the computation of the Wasserstein distance and minimize the distance between two discrete distributions. Moreover, we introduce the proportional transportation distance (PTD) to evaluate the performance of different methods. Experimental results show that the precision and accuracy of the proposed method increase by approximately 48% and 67% respectively compared with the existing methods.

Supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61631016, National Key Research and Development Plan of Ministry of Science and Technology No. 2018YFB1403903 and the Fundamental Research Funds for the Central Universities No. CUC2019E002, CUC19ZD003.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Moorer, J.A.: On the transcription of musical sound by computer. Comput. Music J. 1(4), 32–38 (1977)

    Google Scholar 

  2. Piszczalski, M., Galler, B.A.: Automatic music transcription. Comput. Music J. 1(4), 22–31 (1977)

    Google Scholar 

  3. Duan, Z., Benetos, E.: Automatic music transcription. In: Proceedings of the International Society for Music Information Retrieval Conference, Malaga, Spain (2015)

    Google Scholar 

  4. Chunghsin, Y.: Multiple fundamental frequency estimation of polyphonic recordings (2008)

    Google Scholar 

  5. Nam, J., Ngiam, J., Lee, H., Slaney, M.: A classification-based polyphonic piano transcription approach using learned feature representations (2011)

    Google Scholar 

  6. Duan, Z., Pardo, B., Zhang, C.: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)

    Article  Google Scholar 

  7. Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)

    Article  Google Scholar 

  8. Peeling, P.H., Godsill, S.J.: Multiple pitch estimation using non-homogeneous poisson processes. IEEE J. Sel. Top. Signal Process. 5(6), 1133–1143 (2011)

    Article  Google Scholar 

  9. Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)

    Article  Google Scholar 

  10. Bertin, N., Badeau, R., Vincent, E.: Enforcing harmonicity and smoothness in Bayesian nonnegative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio Speech Lang. Process. 18(3), 538–549 (2010)

    Article  Google Scholar 

  11. Fuentes, B., Badeau, R., Richard, G.: Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 401–404 (2011)

    Google Scholar 

  12. Abdallah, S.M., Plumbley, M.D.: Polyphonic transcription by non-negative sparse coding of power spectra. In: Proceedings of the International Society for Music Information Retrieval Conference (2004)

    Google Scholar 

  13. Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)

    Article  Google Scholar 

  14. Ding, H., Liu, M.: On geometric prototype and applications. In: 26th Annual European Symposium on Algorithms, pp. 1–15 (2018)

    Google Scholar 

  15. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Upper Saddle River (1993)

    MATH  Google Scholar 

  16. Agarwal, P.K., Fox, K., Panigrahi, D., Varadarajan, K.R., Xiao, A.: Faster algorithms for the geometric transportation problem. In: 33rd International Symposium on Computational Geometry, pp. 1–16 (2017)

    Google Scholar 

  17. Cabello, S., Giannopoulos, P., Knauer, C., Rote, G.: Matching point sets with respect to the Earth Mover’s Distance. Comput. Geom. 39(2), 118–133 (2008)

    Article  MathSciNet  Google Scholar 

  18. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)

    Google Scholar 

  19. Cuturi, M., Doucet, A.: Fast computation of Wasserstein Barycenters. In: International Conference on Machine Learning, pp. 685–693 (2014)

    Google Scholar 

  20. Baum, M., Willett, P., Hanebeck, U.D.: On Wasserstein Barycenters and MMOSPA estimation. IEEE Signal Process. Lett. 22(10), 1511–1515 (2015)

    Article  Google Scholar 

  21. Gramfort, A., Peyré, G., Cuturi, M.: Fast optimal transport averaging of neuroimaging data. In: Ourselin, S., Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 261–272. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19992-4_20

    Chapter  Google Scholar 

  22. Ye, J., Wu, P., Wang, J.Z., Li, J.: Fast discrete distribution clustering using Wasserstein Barycenter with sparse support. IEEE Trans. Signal Process. 65(9), 2317–2332 (2017)

    Article  MathSciNet  Google Scholar 

  23. Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), 1111–1138 (2015)

    Article  MathSciNet  Google Scholar 

  24. Ding, H., Berezney, R., Xu, J.: k-prototype learning for 3d rigid structures. In: Advances in Neural Information Processing Systems, pp. 2589–2597 (2013)

    Google Scholar 

  25. Ding, H., Xu, J.: Finding median point-set using earth mover’s distance. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

    Google Scholar 

  26. Staib, M., Claici, S., Solomon, J., Jegelka, S.: Parallel streaming Wasserstein Barycenters. In: Advances in Neural Information Processing Systems, pp. 2647–2658 (2017)

    Google Scholar 

  27. Phillips, J.M.: Coresets and sketches. Comput. Res. Repos. (2016)

    Google Scholar 

  28. Agarwal, P.K., Har-Peled, S., Varadarajan, K.R.: Geometric approximation via coresets. Comb. Comput. Geom. 52, 1–30 (2005)

    MathSciNet  MATH  Google Scholar 

  29. Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180 (2003)

    Google Scholar 

  30. Typke, R., Veltkamp, R.C., Wiering, F.: Searching notated polyphonic music using transportation distances. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 128–135 (2004)

    Google Scholar 

  31. Gao, L., Su, L., Yang, Y.H., Tan, L.: Polyphonic piano note transcription with non-negative matrix factorization of differential spectrogram. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 291–295 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Lv .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jin, C. et al. (2020). An Integrated Processing Method Based on Wasserstein Barycenter Algorithm for Automatic Music Transcription. In: Gao, H., Feng, Z., Yu, J., Wu, J. (eds) Communications and Networking. ChinaCom 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 313. Springer, Cham. https://doi.org/10.1007/978-3-030-41117-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41117-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41116-9

  • Online ISBN: 978-3-030-41117-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics