Multimodal and Multiclass Semi-supervised Image-to-Image Translation
Abstract
In this paper, we propose a multimodal and multiclass semi-supervised image-to-image translation (MM-SSIT) framework to address the dilemma between expensive labeled work and diversity requirement of image translation. A cross-domain adversarial autoencoder is proposed to learn disentangled latent domain-invariant content codes and domain-specific style codes. The style codes are matched with a prior distribution so that we can generate a series of meaningful samples from the prior space. The content codes are embedded into a multiclass joint data distribution by an adversarial learning between a domain classifier and a category classifier so that we can generate multiclass images at one time. Consequently, multimodal and multiclass cross-domain images are generated by joint decoding the latent content codes and sampled style codes. Finally, the networks for MM-SSIT framework are designed and tested. Semi-supervised experiments with comparisons to state-of-art approach show that the proposed framework has the ability to generate high-quality and diversiform images in case of fewer labeled samples. Further experiments in the unsupervised setting demonstrate that MM-SSIT is superior in learning disentangled representation and domain adaption.
Keywords
Image-to-image translation Semi-supervised Adversarial auto encoder Adversarial learningNotes
Acknowledgments
This work is supported by National Natural Science Foundation of China (61762003), Natural Science Foundation of Ningxia (2018AAC03124), and Key R&D Program Projects of Ningxia 2019 (Research on Intelligent Assembly Technology Based on Multi-source Information Fusion).
References
- 1.Zhu, X., Li, Z., et al.: Generative adversarial image super-resolution through deep dense skip connections. In: Computer Graphics Forum (CGF), vol. 37, no. 7, pp. 289–300 (2018)Google Scholar
- 2.Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11CrossRefGoogle Scholar
- 3.Hou, H., Huo, J., Gao, Y.: Cross-Domain Adversarial Auto-Encoder (2018). https://arxiv.org/abs/1804.06078. Accessed 17 Apr 2018
- 4.Isola, P., Zhu, J.Y., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. In: CVPR 2016, vol. 1, pp. 5967–5976. IEEE Computer Society, Los Alamitos (2017)Google Scholar
- 5.Zhu, J.Y., Zhang, R., Pathak, D., et al.: Toward multimodal image-to-image translation. In: The 30th Advances in Neural Information Processing Systems, Long Beach, pp. 465–476. Curran Associates (2017)Google Scholar
- 6.Zhu, J.Y., Park, T., Isola, P., et al.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV 2017, vol. 1, pp. 2242–4421. IEEE Computer Society, Los Alamitos (2017)Google Scholar
- 7.Yi, Z., Zhang, H., Gong, P.T.M.: DualGAN: unsupervised dual learning for image-to-image translation. In: ICCV 2017, vol. 1, pp. 2868–2876. IEEE Computer Society, Los Alamitos (2017)Google Scholar
- 8.Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: The 30th Advances in Neural Information Processing Systems, Long Beach, pp. 700–708. Curran Associates (2017)Google Scholar
- 9.Wang, B., Yang, Y., Xu, X., et al.: Adversarial cross-modal retrieval. In: The 25th ACM International Conference on Multimedia, New York, pp. 157–162 (2017)Google Scholar
- 10.Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: The 27th of Advances in Neural Information Processing Systems, Montreal, pp. 2672–2680. Curran Associates (2014)Google Scholar
- 11.Zhang, X., Shi, H., Zhu, X., Li, P.: Active semi-supervised learning based on self-expressive correlation with generative adversarial networks. Neurocomputing 345, 103–113 (2019)CrossRefGoogle Scholar
- 12.Chen, X., Duan, Y., Houthooft, R., et al.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: The 29th Advances in Neural Information Processing Systems, Barcelona, pp. 2172–2180. Curran Associates (2016)Google Scholar
- 13.Cai, Q., Xue, Z., Zhang, X., Zhu, X.: A novel framework for semantic segmentation with generative adversarial network. J. Vis. Commun. Image Represent. (JVCI) 58, 532–543 (2019)CrossRefGoogle Scholar
- 14.Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: The 29th Advances in Neural Information Processing Systems, Barcelona, pp. 469–477. Curran Associates (2016)Google Scholar
- 15.Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: ICCV 2017, vol. 1, pp. 1520–1529. IEEE Computer Society, Los Alamitos (2017)Google Scholar
- 16.Ghosh, A., Kulharia, V., Namboodiri, V., et al.: Multi-agent diverse generative adversarial networks. In: CVPR 2018, vol. 1, pp. 8513–8521. IEEE Computer Society, Los Alamitos (2018)Google Scholar
- 17.Yann, L., Corinna, C., Christopher, J.B.: MNIST Handwritten Digit Database. AT&T Labs (2010). http://yann.lecun.com/exdb/mnist
- 18.Yuval, N., Tao, W., Adam, C., et al.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning, vol. 2011, p. 5 (2011)Google Scholar
- 19.Li, S., Yi, D., Lei, Z., Liao, S.: The CASIA NIR-VIS 2.0 face database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Los Alamitos, pp. 348–353 (2013)Google Scholar