Abstract
Most supervised machine learning techniques, such as classification, rely on some underlying assumptions, such as: (a) the data distributions during training and prediction time are similar; (b) the label space during training and prediction time are similar; and (c) the feature space between the training and prediction time remains the same. In many real-world scenarios, these assumptions do not hold due to the changing nature of the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Isabelle Augenstein and Anders Søgaard. “Multi-Task Learning of Keyphrase Boundary Classification”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 341–346.
Georgios Balikas, Simon Moura, and Massih-Reza Amini. “Multitask Learning for Fine-Grained Twitter Sentiment Analysis”. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, pp. 1005–1008.
Steffen Bickel, Michael Brückner, and Tobias Scheffer. “Discriminative Learning for Differing Training and Test Distributions”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 81–88.
Rich Caruana. “Multitask Learning”. In: Machine Learning 28.1 (1997), pp. 41–75.
Richard Caruana. “Multitask Learning: A Knowledge-Based Source of Inductive Bias”. In: Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 1993, pp. 41–48.
Eunsol Choi et al. “Coarse-to-Fine Question Answering for Long Documents”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 209–220.
Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multi-task Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08. 2008, pp. 160–167.
George E. Dahl et al. “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”. In: IEEE Trans. Audio, Speech & Language Processing 20.1 (2012), pp. 30–42.
Andrew M Dai and Quoc V Le. “Semi-supervised Sequence Learning”. In: Advances in Neural Information Processing Systems 28. Ed. by C. Cortes et al. 2015, pp. 3079–3087.
Wenyuan Dai et al. “Boosting for Transfer Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 193–200.
Wenyuan Dai et al. “Transferring Naive Bayes Classifiers for Text Classification”. In: Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 1. AAAI’07. 2007, pp. 540–545.
Hal Daumé III and Daniel Marcu. “Domain Adaptation for Statistical Classifiers”. In: J. Artif. Int. Res. 26.1 (May 2006), pp. 101–126.
Adji B. Dieng et al. “TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency.” In: CoRR abs/1611.01702 (2016).
Daxiang Dong et al. “Multi-Task Learning for Multiple Language Translation.” In: ACL (1). 2015, pp. 1723–1732.
Long Duong et al. “Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser”. In: Proceedings of the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015, pp. 845–850.
Dumitru Erhan et al. “Why Does Unsupervised Pre-training Help Deep Learning?” In: J. Mach. Learn. Res. 11 (Mar. 2010).
Meng Fang and Trevor Cohn. “Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary”. In: CoRR abs/1705.00424 (2017).
Gabriel Pui Cheong Fung et al. “Text Classification Without Negative Examples Revisit”. In: IEEE Trans. on Knowl. and Data Eng. 18.1 (Jan. 2006), pp. 6–20.
Kazuma Hashimoto et al. “A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks”. In: CoRR abs/1611.01587 (2016).
Geoffrey Hinton et al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition”. In: Signal Processing Magazine (2012).
Masaru Isonuma et al. “Extractive Summarization Using Multi-Task Learning with Document Classification”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017. 2017, pp. 2101–2110.
Jing Jiang. “Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction”. In: ACL 2009, Proceedings of the 4th International Joint Conference on Natural Language Processing of the AFNL. 2009, pp. 1012–1020.
Jing Jiang and Chengxiang Zhai. “Instance weighting for domain adaptation in NLP”. In: In ACL 2007. 2007, pp. 264–271.
Melvin Johnson et al. “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”. In: CoRR abs/1611.04558 (2016).
Arzoo Katiyar and Claire Cardie. “Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 917–928.
Honglak Lee et al. “Unsupervised feature learning for audio classification using convolutional deep belief networks”. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems. 2009, pp. 1096–1104.
Xiaodong Liu et al. “Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval”. In: NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics.
Mingsheng Long and Jianmin Wang. “Learning Multiple Tasks with Deep Relationship Networks”. In: CoRR abs/1506.02117 (2015).
Yongxi Lu et al. “Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification”. In: CoRR abs/1611.05377 (2016).
Bingfeng Luo et al. “Learning to Predict Charges for Criminal Cases with Legal Basis”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, 2017, pp. 2727–2736.
Minh-Thang Luong et al. “Multi-task Sequence to Sequence Learning”. In: CoRR abs/1511.06114 (2015).
Ishan Misra et al. “Cross-stitch Networks for Multi-task Learning”. In: CoRR abs/1604.03539 (2016).
Jan Niehues and Eunah Cho. “Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning”. In: Proceedings of the Second Conference on Machine Translation. Association for Computational Linguistics, 2017, pp. 80–89.
Sinno Jialin Pan and Qiang Yang. “A Survey on Transfer Learning”. In: IEEE Trans. on Knowl. and Data Eng. 22.10 (Oct. 2010), pp. 1345–1359.
Sinno Jialin Pan et al. “Transfer Learning for WiFi-based Indoor Localization”. In: 2008.
Rajat Raina et al. “Self-taught Learning: Transfer Learning from Unlabeled Data”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 759–766.
Prajit Ramachandran, Peter J. Liu, and Quoc V. Le. “Unsupervised Pretraining for Sequence to Sequence Learning”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9–11, 2017. 2017, pp. 383–391.
Marek Rei. “Semi-supervised Multitask Learning for Sequence Labeling”. In: CoRR abs/1704.07156 (2017).
Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017).
Anders Søgaard and Yoav Goldberg. “Deep multi-task learning with low level tasks supervised at lower layers”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2: Short Papers. 2016.
Matthew E. Taylor and Peter Stone. “Cross-domain Transfer for Reinforcement Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 879–886.
“Transfer Learning Proposer Information Pamphlet (PIP) for Broad Agency Announcement”. In: Defense Advanced Research Projects Agency (DARPA), 2005.
Joseph Turian, Lev Ratinov, and Yoshua Bengio. “Word Representations: A Simple and General Method for Semi-supervised Learning”. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. ACL ’10. 2010.
Shuohang Wang et al. “R3: Reinforced Ranker-Reader for Open-Domain Question Answering”. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
Zheng Wang, Yangqiu Song, and Changshui Zhang. “Transferred Dimensionality Reduction”. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Part II. ECML PKDD ’08. 2008, pp. 550–565.
Shinji Watanabe, Takaaki Hori, and John R. Hershey. “Language independent end-to-end architecture for joint language identification and speech recognition”. In: ASRU. IEEE, 2017, pp. 265–271.
Shinji Watanabe et al. “Hybrid CTC/Attention Architecture for End-to-End Speech Recognition”. In: J. Sel. Topics Signal Processing 11.8 (2017), pp. 1240–1253.
Shinji Watanabe et al. “A Purely End-to-End System for Multi-speaker Speech Recognition”. In: ACL (1). Association for Computational Linguistics, 2018, pp. 2620–2630.
Bishan Yang and Tom M. Mitchell. “A Joint Sequential and Relational Model for Frame-Semantic Parsing”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing EMNLP 2017. 2017, pp. 1247–1256.
Zhilin Yang, Ruslan Salakhutdinov, and William W. Cohen. “Multi-Task Cross-Lingual Sequence Tagging from Scratch”. In: CoRR abs/1603.06270 (2016).
Barret Zoph and Kevin Knight. “Multi-Source Neural Translation”. In: CoRR abs/1601.00710 (2016).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kamath, U., Liu, J., Whitaker, J. (2019). Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-14596-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14595-8
Online ISBN: 978-3-030-14596-5
eBook Packages: Computer ScienceComputer Science (R0)