Neural Melody Composition from Lyrics

Bao, Hangbo; Huang, Shaohan; Wei, Furu; Cui, Lei; Wu, Yu; Tan, Chuanqi; Piao, Songhao; Zhou, Ming

doi:10.1007/978-3-030-32233-5_39

Neural Melody Composition from Lyrics

Hangbo Bao¹³,
Shaohan Huang¹⁴,
Furu Wei¹⁴,
Lei Cui¹⁴,
Yu Wu¹⁴,
Chuanqi Tan¹⁵,
Songhao Piao¹³ &
…
Ming Zhou¹⁴

Conference paper
First Online: 30 September 2019

2404 Accesses
11 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11838))

Abstract

In this paper, we study a novel task that learns to compose music from natural language. Given the lyrics as input, we propose a melody composition model that generates lyrics-conditional melody as well as the exact alignment between the generated melody and the given lyrics simultaneously. More specifically, we develop the melody composition model based on the sequence-to-sequence framework. It consists of two neural encoders to encode the current lyrics and the context melody respectively, and a hierarchical decoder to jointly produce musical notes and the corresponding alignment. Experimental results on lyrics-melody pairs of 18,451 pop songs demonstrate the effectiveness of our proposed methods. In addition, we apply a singing voice synthesizer software to synthesize the “singing” of the lyrics and melodies for human evaluation. Results indicate that our generated melodies are more melodious and tuneful compared with the baseline method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A syllable is a word or part of a word which contains a single vowel sound and that is pronounced as a unit. Chinese is a monosyllabic language which means words (Chinese characters) predominantly consist of a single syllable (https://en.wikipedia.org/wiki/Monosyllabic_language).
2.
https://newt.phys.unsw.edu.au/jw/notes.html.
3.
https://en.wikipedia.org/wiki/Duration_(music).
4.
https://en.wikipedia.org/wiki/Pinyin.
5.
We calculate these metrics by scikit-learn with the parameter average set as ‘weighted’: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.
6.
A singing voice synthesizer software which can synthesize Chinese song, http://www.dsoundsoft.com/product/niaoeditor/.

References

Ackerman, M., Loker, D.: Algorithmic songwriting with ALYSIA. In: Correia, J., Ciesielski, V., Liapis, A. (eds.) EvoMUSART 2017. LNCS, vol. 10198, pp. 1–16. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55750-2_1
Chapter Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014). http://arxiv.org/abs/1409.0473
Chan, M., Potter, J., Schubert, E.: Improving algorithmic music composition with machine learning. In: Proceedings of the 9th International Conference on Music Perception and Cognition, ICMPC (2006)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 25–29 October 2014, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1724–1734 (2014). http://aclweb.org/anthology/D/D14/D14-1179.pdf
Kingma, D.P., Jimmy, B.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Fukayama, S., Nakatsuma, K., Sako, S., Nishimoto, T., Sagayama, S.: Automatic song composition from the lyrics exploiting prosody of the Japanese language. In: Proceedings 7th Sound and Music Computing Conference (SMC), pp. 299–302 (2010)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Monteith, K., Martinez, T.R., Ventura, D.: Automatic generation of melodic accompaniments for lyrics. In: ICCC, pp. 87–94 (2012)
Google Scholar
Pachet, F., Papadopoulos, A., Roy, P.: Sampling variations of sequences for structured music generation. In: Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017), Suzhou, China, pp. 167–173 (2017)
Google Scholar
Pachet, F., Roy, P.: Markov constraints: steerable generation of markov sequences. Constraints 16(2), 148–172 (2011)
Article MathSciNet Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. arXiv preprint arXiv:1803.05428 (2018)
Schatzmann, J., Georgila, K., Young, S.: Quantitative evaluation of user simulation techniques for spoken dialogue systems. In: 6th SIGdial Workshop on DISCOURSE and DIALOGUE (2005)
Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093
Article Google Scholar
Scirea, M., Barros, G.A., Shaker, N., Togelius, J.: SMUG: scientific music generator. In: ICCC, pp. 204–211 (2015)
Google Scholar
Watanabe, K., Matsubayashi, Y., Fukayama, S., Goto, M., Inui, K., Nakano, T.: A melody-conditioned lyrics language model. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol. 1, pp. 163–172 (2018)
Google Scholar
Zhang, X., Lapata, M.: Chinese poetry generation with recurrent neural networks. In: EMNLP, pp. 670–680 (2014)
Google Scholar
Zhu, H., et al..: Xiaoice band: a melody and arrangement generation framework for pop music. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2837–2846. ACM (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Hangbo Bao & Songhao Piao
Microsoft Research, Beijing, China
Shaohan Huang, Furu Wei, Lei Cui, Yu Wu & Ming Zhou
Beihang University, Beijing, China
Chuanqi Tan

Authors

Hangbo Bao
View author publications
You can also search for this author in PubMed Google Scholar
Shaohan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Furu Wei
View author publications
You can also search for this author in PubMed Google Scholar
Lei Cui
View author publications
You can also search for this author in PubMed Google Scholar
Yu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chuanqi Tan
View author publications
You can also search for this author in PubMed Google Scholar
Songhao Piao
View author publications
You can also search for this author in PubMed Google Scholar
Ming Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hangbo Bao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bao, H. et al. (2019). Neural Melody Composition from Lyrics. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-32233-5_39
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32232-8
Online ISBN: 978-3-030-32233-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)