Abstract
In spite of numerous researches on transfer learning, the consensus on the optimal method in transfer learning has not been reached. To render a unified theoretical understanding of transfer learning, we rephrase the crux of transfer learning as pursuing the optimal initialisation in facilitating the to-be-transferred task. Hence, to obtain an ideal initialisation, we propose a novel initialisation technique, i.e., adapted generative initialisation. Not limit to boost the task transfer, more importantly, the proposed initialisation can also bound the transfer benefits in defending the devastating negative transfer. At first stage in our proposed initialisation, the in-congruency between a task and its assigned learner (model) can be alleviated through feeding the knowledge of the target learner to train the source learner, whereas the later generative stage ensures the adapted initialisation can be properly produced to the target learner. The superiority of our proposed initialisation over conventional neural network based approaches was validated in our preliminary experiment on MNIST dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A learner in this article represents any types of computational models, such as neural networks.
- 2.
\(\rightarrow \) denotes the direction of knowledge transfer, e.g., T1D1 \(\rightarrow \) T2D2 means the knowledge is extracted from a prior learning of a complex task through a cumbersome learner, then transferred to assist the learning of a simple task through a compact learner.
References
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 264–271 (2007)
Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: Advances in Neural Information Processing Systems, pp. 41–48 (2007)
Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 109–117. ACM (2004)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1717–1724. IEEE (2014)
Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural Network. http://arxiv.org/abs/1503.02531 (2015)
Lee, G., Yang, E., Hwang, S.J.: Asymmetric multi-task learning based on task relatedness and loss. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML’16. JMLR.org, vol. 48, pp. 230–238. http://dl.acm.org/citation.cfm?id=3045390.3045416 (2016)
Mahmud, M.M., Ray, S.: Transfer learning using Kolmogorov complexity: basic theory and empirical evaluations. In: Advances in Neural Information Processing Systems, pp. 985–992 (2008)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp. 1139–1147 (2013)
Neal, R.M.: Bayesian Learning for Neural Networks, vol. 118. Springer Science & Business Media (2012)
Rubin, D.B.: The Bayesian Bootstrap, vol. 9, no. 1, pp. 130–134. https://projecteuclid.org/euclid.aos/1176345338 (1981)
Higgins, J.J.: Introduction to Modern Nonparametric Statistics (2003)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Bonilla, E.V., Chai, K.M., Williams, C.: Multi-task Gaussian process prediction. In: Advances in Neural Information Processing Systems, pp. 153–160 (2008)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6(2), 181–214 (1994)
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Acknowledgements
This study is partially supported by the Okawa Foundation for Information and Telecommunications, and National Natural Science Foundation of China under Grant No. 61472117.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bai, W., Quan, C., Luo, ZW. (2019). Adaptive Generative Initialization in Transfer Learning. In: Lee, R. (eds) Computer and Information Science. ICIS 2018. Studies in Computational Intelligence, vol 791. Springer, Cham. https://doi.org/10.1007/978-3-319-98693-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-98693-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98692-0
Online ISBN: 978-3-319-98693-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)