Convolutional Neural Networks and Transfer Learning Applied to Automatic Composition of Descriptive Music
Visual and musical arts has been strongly interconnected throughout history. The aim of this work is to compose music on the basis of the visual characteristics of a video. For this purpose, descriptive music is used as a link between image and sound and a video fragment of film Fantasia is deeply analyzed. Specially, convolutional neural networks in combination with transfer learning are applied in the process of extracting image descriptors. In order to establish a relationship between the visual and musical information, Naive Bayes, Support Vector Machine and Random Forest classifiers are applied. The obtained model is subsequently employed to compose descriptive music from a new video. The results of this proposal are compared with those of an antecedent work in order to evaluate the performance of the classifiers and the quality of the descriptive musical composition.
KeywordsDescriptive music Automatic composition Image Video Transfer learning Convolutional neural networks
This work was supported by the Spanish Ministry, Ministerio de Economía y Competitividad and FEDER funds. Project. SURF: Intelligent System for integrated and sustainable management of urban fleets TIN2015-65515-C4-3-R.
- 3.Culhane, J.: Fantasia 2000: Visions of Hope. Disney Editions, Glendale (1999)Google Scholar
- 4.Haykin, S., Network, N.: A comprehensive foundation. Neural Netw. 2(2004), 41 (2004)Google Scholar
- 5.Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)Google Scholar
- 6.John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
- 7.Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
- 9.Lu, G., Phillips, J.: Using perceptually weighted histograms for colour-based image retrieval. In: 1998 Fourth International Conference on Signal Processing Proceedings, 1998. ICSP 1998, vol. 2, pp. 1150–1153. IEEE (1998)Google Scholar
- 11.Martín-Gómez, L., Pérez-Marcos, J.: Image and sound data from film Fantasia produced by Walt Disney (2018). https://figshare.com/articles/FantasiaDisney_ImageSound/5999207
- 12.Martín-Gómez, L., Pérez-Marcos, J., Navarro-Cáceres, M.: Automatic composition of descriptive music: a case study of the relationship between image and sound. In: Proceedings of the Workshop Computational Creativity, Concept Invention, and General Intelligence (C3GI) 2017 (2017)Google Scholar
- 13.Martín-Gmez, L., Pérez-Marcos, J.: Data repository of fantasia case study (2017). https://github.com/lumg/FantasiaDisney_data
- 16.Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
- 17.Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 197–206. ACM (2007)Google Scholar