Abstract
Creativity is a fundamental feature of human intelligence, and a challenge for artificial intelligence (AI). In recent years, AI has gained tremendous development in many single tasks with single models, such as classification, detection and parsing. As the development continued, AI has been increasingly used for more complex tasks, multitasking for example, and then research in multimodal fusion naturally became a new trend. In this paper, we propose a multimodal fusion framework and system to generate traditional Chinese paintings. We select suitable existing networks for different elements generation in this oldest continuous artistic traditions artwork, and finally fusion these networks and elements to create a complete new painting. Meanwhile, we propose a divide-and-conquer strategy to generate large images with limited GPU resources. In our end-to-end system, a large image becomes a traditional Chinese painting in minutes automatically. It shows that our multimodal fusion framework works well and AI methods has good performance in traditional Chinese painting creation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boden, M.A.: Creativity and artificial intelligence. Artif. Intell. 103(1/2), 347–356 (1998)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)
http://oilpaintingfactory.com/traditional-Chinese-painting.html
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)
Gauthier, J.: Conditional generative adversarial nets for convolutional face generation. In: Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester (2014)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)
Gregor, K., Danihelka, I., Graves, A., Wierstra, D.D.: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015)
van den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016)
Yang, J., Reed, S., Yang, M.-H., Lee, H.: Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In: NIPS (2015)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
Tieleman, T.: Optimizing neural networks that generate images. Ph.D. thesis, University of Toronto (2014)
Dosovitskiy, A., Springenberg, J., Tatarchenko, M., Brox, T.: Learning to generate chairs, tables and cars with convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2016). https://doi.org/10.1109/TPAMI.2016.2567384
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556 (2014)
van den Oord, A., et al.: Conditional image generation with PixelCNN decoders. CoRR, abs/1606.05328 (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv:1605.05396 (2016)
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene paring through ADE20K dataset. In: Proceedings of CVPR (2017)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. arXiv preprint arXiv:1711.11585
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS (2016)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Karpathy, A.: The unreasonable effectiveness of recurrent neural networks. Andrej Karpathy Blog (2015). http://karpathy.github.io
Antol, S., et al.: VQA: visual question answering. In: ICCV (2015)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Nos. U1536203, 61572493), IIE project (No. Y6Z0021102, No. Y7Z0241102) and CCF-Tencent Open Research Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Luo, S., Liu, S., Han, J., Guo, T. (2018). Multimodal Fusion for Traditional Chinese Painting Generation. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-00764-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00763-8
Online ISBN: 978-3-030-00764-5
eBook Packages: Computer ScienceComputer Science (R0)