Skip to main content

Convolutional Neural Networks and Transfer Learning Applied to Automatic Composition of Descriptive Music

  • Conference paper
  • First Online:
  • 676 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 801))

Abstract

Visual and musical arts has been strongly interconnected throughout history. The aim of this work is to compose music on the basis of the visual characteristics of a video. For this purpose, descriptive music is used as a link between image and sound and a video fragment of film Fantasia is deeply analyzed. Specially, convolutional neural networks in combination with transfer learning are applied in the process of extracting image descriptors. In order to establish a relationship between the visual and musical information, Naive Bayes, Support Vector Machine and Random Forest classifiers are applied. The obtained model is subsequently employed to compose descriptive music from a new video. The results of this proposal are compared with those of an antecedent work in order to evaluate the performance of the classifiers and the quality of the descriptive musical composition.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  2. Clague, M.: Playing in ’Toon: Walt Disney’s "Fantasia" (1940) and the imagineering of classical music. Am. Music 22(1), 91–109 (2004)

    Article  Google Scholar 

  3. Culhane, J.: Fantasia 2000: Visions of Hope. Disney Editions, Glendale (1999)

    Google Scholar 

  4. Haykin, S., Network, N.: A comprehensive foundation. Neural Netw. 2(2004), 41 (2004)

    Google Scholar 

  5. Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)

    Google Scholar 

  6. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)

    Google Scholar 

  7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  MathSciNet  Google Scholar 

  9. Lu, G., Phillips, J.: Using perceptually weighted histograms for colour-based image retrieval. In: 1998 Fourth International Conference on Signal Processing Proceedings, 1998. ICSP 1998, vol. 2, pp. 1150–1153. IEEE (1998)

    Google Scholar 

  10. Marks, L.E.: On colored-hearing synesthesia: cross-modal translations of sensory dimensions. Psychol. Bull. 82(3), 303 (1975)

    Article  Google Scholar 

  11. Martín-Gómez, L., Pérez-Marcos, J.: Image and sound data from film Fantasia produced by Walt Disney (2018). https://figshare.com/articles/FantasiaDisney_ImageSound/5999207

  12. Martín-Gómez, L., Pérez-Marcos, J., Navarro-Cáceres, M.: Automatic composition of descriptive music: a case study of the relationship between image and sound. In: Proceedings of the Workshop Computational Creativity, Concept Invention, and General Intelligence (C3GI) 2017 (2017)

    Google Scholar 

  13. Martín-Gmez, L., Pérez-Marcos, J.: Data repository of fantasia case study (2017). https://github.com/lumg/FantasiaDisney_data

  14. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  15. Seeger, C.: Prescriptive and descriptive music-writing. Music. Q. 44(2), 184–195 (1958)

    Article  Google Scholar 

  16. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  17. Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 197–206. ACM (2007)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Spanish Ministry, Ministerio de Economía y Competitividad and FEDER funds. Project. SURF: Intelligent System for integrated and sustainable management of urban fleets TIN2015-65515-C4-3-R.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucía Martín-Gómez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martín-Gómez, L., Pérez-Marcos, J., Navarro-Cáceres, M., Rodríguez-González, S. (2019). Convolutional Neural Networks and Transfer Learning Applied to Automatic Composition of Descriptive Music. In: Rodríguez, S., et al. Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference. DCAI 2018. Advances in Intelligent Systems and Computing, vol 801. Springer, Cham. https://doi.org/10.1007/978-3-319-99608-0_31

Download citation

Publish with us

Policies and ethics