Procedural Content Generation of Rhythm Games Using Deep Learning Methods

  • Yubin Liang
  • Wanxiang LiEmail author
  • Kokolo Ikeda
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11863)


The rhythm game is a type of video game which is popular to many people. But the game contents (required action and its timing) of rhythm game are usually hand-crafted by human designers. In this research, we proposed an automatic generation method to generate game contents from the music file of the famous rhythm game “OSU!” 4k mode. Generally, the supervised learning method is used to generate such game contents. In this research some new methods are purposed, one is called “fuzzy label” method, which shows better performance on our training data. Another is to use the new model C-BLSTM. On our test data, we improved the F-Score of timestamp prediction from 0.8159 to 0.8430. Also, it was confirmed through experiments that human players could feel the generated beatmap is more natural than previous research.


Procedural Content Generation Rhythm game C-BLSTM 



This research is financially supported by Japan Society for the Promotion of Science (JSPS) under contract number 17K00506.


  1. 1.
    Osu! Accessed June 2017
  2. 2.
    Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)CrossRefGoogle Scholar
  3. 3.
    Donahue, C., Lipton, Z.C., McAuley, J.: Dance dance convolution. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1039–1048. JMLR. org (2017)Google Scholar
  4. 4.
    Hamel, P., Bengio, Y., Eck, D.: Building musically-relevant audio features through multiple timescale representations (2012)Google Scholar
  5. 5.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  6. 6.
    Kalchbrenner, N., Danihelka, I., Graves, A.: Grid long short-term memory. arXiv preprint arXiv:1507.01526 (2015)
  7. 7.
    Parascandolo, G., Huttunen, H., Virtanen, T.: Recurrent neural networks for polyphonic sound event detection in real life recordings. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6440–6444. IEEE (2016)Google Scholar
  8. 8.
    Pasinski, A.: Possible benefits of playing music video games (2014)Google Scholar
  9. 9.
    Salamon, J., Gómez, E.: Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Trans. Audio Speech Lang. Process. 20(6), 1759–1770 (2012)CrossRefGoogle Scholar
  10. 10.
    Schlüter, J., Böck, S.: Improved musical onset detection with convolutional neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6979–6983. IEEE (2014)Google Scholar
  11. 11.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  12. 12.
    Stevens, S., Volkmann, J., Newman, E.: The mel scale equates the magnitude of perceived differences in pitch at different frequencies. J. Acoust. Soc. Am. 8(3), 185–190 (1937)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  1. 1.School of Information ScienceJapan Advanced Institute of Science and TechnologyNomiJapan

Personalised recommendations