An Integrated Processing Method Based on Wasserstein Barycenter Algorithm for Automatic Music Transcription

Jin, Cong; Li, Zhongtong; Sun, Yuanyuan; Zhang, Haiyin; Lv, Xin; Li, Jianguang; Liu, Shouxun

doi:10.1007/978-3-030-41117-6_19

Cong Jin¹⁹,
Zhongtong Li¹⁹,
Yuanyuan Sun¹⁹,
Haiyin Zhang²⁰,
Xin Lv²¹,
Jianguang Li²² &
…
Shouxun Liu²²

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 313))

Included in the following conference series:

International Conference on Communications and Networking in China

591 Accesses

Abstract

Given a piece of acoustic musical signal, various automatic music transcription (AMT) processing methods have been proposed to generate the corresponding music notations without human intervention. However, the existing AMT methods based on signal processing or machine learning cannot perfectly restore the original music signal and have significant distortion. In this paper, we propose a novel processing method which integrates various AMT methods so as to achieve better performance on music transcription. This integrated method is based on the entropic regularized Wasserstein Barycenter algorithm to speed up the computation of the Wasserstein distance and minimize the distance between two discrete distributions. Moreover, we introduce the proportional transportation distance (PTD) to evaluate the performance of different methods. Experimental results show that the precision and accuracy of the proposed method increase by approximately 48% and 67% respectively compared with the existing methods.

Supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61631016, National Key Research and Development Plan of Ministry of Science and Technology No. 2018YFB1403903 and the Fundamental Research Funds for the Central Universities No. CUC2019E002, CUC19ZD003.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Moorer, J.A.: On the transcription of musical sound by computer. Comput. Music J. 1(4), 32–38 (1977)
Google Scholar
Piszczalski, M., Galler, B.A.: Automatic music transcription. Comput. Music J. 1(4), 22–31 (1977)
Google Scholar
Duan, Z., Benetos, E.: Automatic music transcription. In: Proceedings of the International Society for Music Information Retrieval Conference, Malaga, Spain (2015)
Google Scholar
Chunghsin, Y.: Multiple fundamental frequency estimation of polyphonic recordings (2008)
Google Scholar
Nam, J., Ngiam, J., Lee, H., Slaney, M.: A classification-based polyphonic piano transcription approach using learned feature representations (2011)
Google Scholar
Duan, Z., Pardo, B., Zhang, C.: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)
Article Google Scholar
Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)
Article Google Scholar
Peeling, P.H., Godsill, S.J.: Multiple pitch estimation using non-homogeneous poisson processes. IEEE J. Sel. Top. Signal Process. 5(6), 1133–1143 (2011)
Article Google Scholar
Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)
Article Google Scholar
Bertin, N., Badeau, R., Vincent, E.: Enforcing harmonicity and smoothness in Bayesian nonnegative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio Speech Lang. Process. 18(3), 538–549 (2010)
Article Google Scholar
Fuentes, B., Badeau, R., Richard, G.: Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 401–404 (2011)
Google Scholar
Abdallah, S.M., Plumbley, M.D.: Polyphonic transcription by non-negative sparse coding of power spectra. In: Proceedings of the International Society for Music Information Retrieval Conference (2004)
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)
Article Google Scholar
Ding, H., Liu, M.: On geometric prototype and applications. In: 26th Annual European Symposium on Algorithms, pp. 1–15 (2018)
Google Scholar
Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Upper Saddle River (1993)
MATH Google Scholar
Agarwal, P.K., Fox, K., Panigrahi, D., Varadarajan, K.R., Xiao, A.: Faster algorithms for the geometric transportation problem. In: 33rd International Symposium on Computational Geometry, pp. 1–16 (2017)
Google Scholar
Cabello, S., Giannopoulos, P., Knauer, C., Rote, G.: Matching point sets with respect to the Earth Mover’s Distance. Comput. Geom. 39(2), 118–133 (2008)
Article MathSciNet Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Google Scholar
Cuturi, M., Doucet, A.: Fast computation of Wasserstein Barycenters. In: International Conference on Machine Learning, pp. 685–693 (2014)
Google Scholar
Baum, M., Willett, P., Hanebeck, U.D.: On Wasserstein Barycenters and MMOSPA estimation. IEEE Signal Process. Lett. 22(10), 1511–1515 (2015)
Article Google Scholar
Gramfort, A., Peyré, G., Cuturi, M.: Fast optimal transport averaging of neuroimaging data. In: Ourselin, S., Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 261–272. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19992-4_20
Chapter Google Scholar
Ye, J., Wu, P., Wang, J.Z., Li, J.: Fast discrete distribution clustering using Wasserstein Barycenter with sparse support. IEEE Trans. Signal Process. 65(9), 2317–2332 (2017)
Article MathSciNet Google Scholar
Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), 1111–1138 (2015)
Article MathSciNet Google Scholar
Ding, H., Berezney, R., Xu, J.: k-prototype learning for 3d rigid structures. In: Advances in Neural Information Processing Systems, pp. 2589–2597 (2013)
Google Scholar
Ding, H., Xu, J.: Finding median point-set using earth mover’s distance. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)
Google Scholar
Staib, M., Claici, S., Solomon, J., Jegelka, S.: Parallel streaming Wasserstein Barycenters. In: Advances in Neural Information Processing Systems, pp. 2647–2658 (2017)
Google Scholar
Phillips, J.M.: Coresets and sketches. Comput. Res. Repos. (2016)
Google Scholar
Agarwal, P.K., Har-Peled, S., Varadarajan, K.R.: Geometric approximation via coresets. Comb. Comput. Geom. 52, 1–30 (2005)
MathSciNet MATH Google Scholar
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180 (2003)
Google Scholar
Typke, R., Veltkamp, R.C., Wiering, F.: Searching notated polyphonic music using transportation distances. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 128–135 (2004)
Google Scholar
Gao, L., Su, L., Yang, Y.H., Tan, L.: Polyphonic piano note transcription with non-negative matrix factorization of differential spectrogram. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 291–295 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Communication University of China, Beijing, 100024, China
Cong Jin, Zhongtong Li & Yuanyuan Sun
School of Computer and Cyberspace Security, Communication University of China, Beijing, 100024, China
Haiyin Zhang
School of Animation and Digital Arts, Communication University of China, Beijing, 100024, China
Xin Lv
Communication University of China, Beijing, 100024, China
Jianguang Li & Shouxun Liu

Authors

Cong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Zhongtong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Haiyin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Lv
View author publications
You can also search for this author in PubMed Google Scholar
Jianguang Li
View author publications
You can also search for this author in PubMed Google Scholar
Shouxun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Lv .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
School of Computer Software, Tianjin University, Tianjin, China
Zhiyong Feng
Hangzhou Dianzi University, Hangzhou, China
Jun Yu
Tongji University, Shanghai Shi, China
Jun Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, C. et al. (2020). An Integrated Processing Method Based on Wasserstein Barycenter Algorithm for Automatic Music Transcription. In: Gao, H., Feng, Z., Yu, J., Wu, J. (eds) Communications and Networking. ChinaCom 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 313. Springer, Cham. https://doi.org/10.1007/978-3-030-41117-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-41117-6_19
Published: 27 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41116-9
Online ISBN: 978-3-030-41117-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics