Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning

Kamath, Uday; Liu, John; Whitaker, James

doi:10.1007/978-3-030-14596-5_10

Uday Kamath⁴,
John Liu⁵ &
James Whitaker⁴

8944 Accesses
3 Citations

Abstract

Most supervised machine learning techniques, such as classification, rely on some underlying assumptions, such as: (a) the data distributions during training and prediction time are similar; (b) the label space during training and prediction time are similar; and (c) the feature space between the training and prediction time remains the same. In many real-world scenarios, these assumptions do not hold due to the changing nature of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Isabelle Augenstein and Anders Søgaard. “Multi-Task Learning of Keyphrase Boundary Classification”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 341–346.
Google Scholar
Georgios Balikas, Simon Moura, and Massih-Reza Amini. “Multitask Learning for Fine-Grained Twitter Sentiment Analysis”. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, pp. 1005–1008.
Google Scholar
Steffen Bickel, Michael Brückner, and Tobias Scheffer. “Discriminative Learning for Differing Training and Test Distributions”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 81–88.
Google Scholar
Rich Caruana. “Multitask Learning”. In: Machine Learning 28.1 (1997), pp. 41–75.
Article MathSciNet Google Scholar
Richard Caruana. “Multitask Learning: A Knowledge-Based Source of Inductive Bias”. In: Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 1993, pp. 41–48.
Google Scholar
Eunsol Choi et al. “Coarse-to-Fine Question Answering for Long Documents”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 209–220.
Google Scholar
Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multi-task Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08. 2008, pp. 160–167.
Google Scholar
George E. Dahl et al. “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”. In: IEEE Trans. Audio, Speech & Language Processing 20.1 (2012), pp. 30–42.
Google Scholar
Andrew M Dai and Quoc V Le. “Semi-supervised Sequence Learning”. In: Advances in Neural Information Processing Systems 28. Ed. by C. Cortes et al. 2015, pp. 3079–3087.
Google Scholar
Wenyuan Dai et al. “Boosting for Transfer Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 193–200.
Google Scholar
Wenyuan Dai et al. “Transferring Naive Bayes Classifiers for Text Classification”. In: Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 1. AAAI’07. 2007, pp. 540–545.
Google Scholar
Hal Daumé III and Daniel Marcu. “Domain Adaptation for Statistical Classifiers”. In: J. Artif. Int. Res. 26.1 (May 2006), pp. 101–126.
Google Scholar
Adji B. Dieng et al. “TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency.” In: CoRR abs/1611.01702 (2016).
Google Scholar
Daxiang Dong et al. “Multi-Task Learning for Multiple Language Translation.” In: ACL (1). 2015, pp. 1723–1732.
Google Scholar
Long Duong et al. “Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser”. In: Proceedings of the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015, pp. 845–850.
Google Scholar
Dumitru Erhan et al. “Why Does Unsupervised Pre-training Help Deep Learning?” In: J. Mach. Learn. Res. 11 (Mar. 2010).
Google Scholar
Meng Fang and Trevor Cohn. “Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary”. In: CoRR abs/1705.00424 (2017).
Google Scholar
Gabriel Pui Cheong Fung et al. “Text Classification Without Negative Examples Revisit”. In: IEEE Trans. on Knowl. and Data Eng. 18.1 (Jan. 2006), pp. 6–20.
Google Scholar
Kazuma Hashimoto et al. “A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks”. In: CoRR abs/1611.01587 (2016).
Google Scholar
Geoffrey Hinton et al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition”. In: Signal Processing Magazine (2012).
Google Scholar
Masaru Isonuma et al. “Extractive Summarization Using Multi-Task Learning with Document Classification”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017. 2017, pp. 2101–2110.
Google Scholar
Jing Jiang. “Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction”. In: ACL 2009, Proceedings of the 4th International Joint Conference on Natural Language Processing of the AFNL. 2009, pp. 1012–1020.
Google Scholar
Jing Jiang and Chengxiang Zhai. “Instance weighting for domain adaptation in NLP”. In: In ACL 2007. 2007, pp. 264–271.
Google Scholar
Melvin Johnson et al. “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”. In: CoRR abs/1611.04558 (2016).
Google Scholar
Arzoo Katiyar and Claire Cardie. “Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 917–928.
Google Scholar
Honglak Lee et al. “Unsupervised feature learning for audio classification using convolutional deep belief networks”. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems. 2009, pp. 1096–1104.
Google Scholar
Xiaodong Liu et al. “Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval”. In: NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics.
Google Scholar
Mingsheng Long and Jianmin Wang. “Learning Multiple Tasks with Deep Relationship Networks”. In: CoRR abs/1506.02117 (2015).
Google Scholar
Yongxi Lu et al. “Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification”. In: CoRR abs/1611.05377 (2016).
Google Scholar
Bingfeng Luo et al. “Learning to Predict Charges for Criminal Cases with Legal Basis”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, 2017, pp. 2727–2736.
Google Scholar
Minh-Thang Luong et al. “Multi-task Sequence to Sequence Learning”. In: CoRR abs/1511.06114 (2015).
Google Scholar
Ishan Misra et al. “Cross-stitch Networks for Multi-task Learning”. In: CoRR abs/1604.03539 (2016).
Google Scholar
Jan Niehues and Eunah Cho. “Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning”. In: Proceedings of the Second Conference on Machine Translation. Association for Computational Linguistics, 2017, pp. 80–89.
Google Scholar
Sinno Jialin Pan and Qiang Yang. “A Survey on Transfer Learning”. In: IEEE Trans. on Knowl. and Data Eng. 22.10 (Oct. 2010), pp. 1345–1359.
Google Scholar
Sinno Jialin Pan et al. “Transfer Learning for WiFi-based Indoor Localization”. In: 2008.
Google Scholar
Rajat Raina et al. “Self-taught Learning: Transfer Learning from Unlabeled Data”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 759–766.
Google Scholar
Prajit Ramachandran, Peter J. Liu, and Quoc V. Le. “Unsupervised Pretraining for Sequence to Sequence Learning”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9–11, 2017. 2017, pp. 383–391.
Google Scholar
Marek Rei. “Semi-supervised Multitask Learning for Sequence Labeling”. In: CoRR abs/1704.07156 (2017).
Google Scholar
Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017).
Google Scholar
Anders Søgaard and Yoav Goldberg. “Deep multi-task learning with low level tasks supervised at lower layers”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2: Short Papers. 2016.
Google Scholar
Matthew E. Taylor and Peter Stone. “Cross-domain Transfer for Reinforcement Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 879–886.
Google Scholar
“Transfer Learning Proposer Information Pamphlet (PIP) for Broad Agency Announcement”. In: Defense Advanced Research Projects Agency (DARPA), 2005.
Google Scholar
Joseph Turian, Lev Ratinov, and Yoshua Bengio. “Word Representations: A Simple and General Method for Semi-supervised Learning”. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. ACL ’10. 2010.
Google Scholar
Shuohang Wang et al. “R³: Reinforced Ranker-Reader for Open-Domain Question Answering”. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
Google Scholar
Zheng Wang, Yangqiu Song, and Changshui Zhang. “Transferred Dimensionality Reduction”. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Part II. ECML PKDD ’08. 2008, pp. 550–565.
Google Scholar
Shinji Watanabe, Takaaki Hori, and John R. Hershey. “Language independent end-to-end architecture for joint language identification and speech recognition”. In: ASRU. IEEE, 2017, pp. 265–271.
Google Scholar
Shinji Watanabe et al. “Hybrid CTC/Attention Architecture for End-to-End Speech Recognition”. In: J. Sel. Topics Signal Processing 11.8 (2017), pp. 1240–1253.
Google Scholar
Shinji Watanabe et al. “A Purely End-to-End System for Multi-speaker Speech Recognition”. In: ACL (1). Association for Computational Linguistics, 2018, pp. 2620–2630.
Google Scholar
Bishan Yang and Tom M. Mitchell. “A Joint Sequential and Relational Model for Frame-Semantic Parsing”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing EMNLP 2017. 2017, pp. 1247–1256.
Google Scholar
Zhilin Yang, Ruslan Salakhutdinov, and William W. Cohen. “Multi-Task Cross-Lingual Sequence Tagging from Scratch”. In: CoRR abs/1603.06270 (2016).
Google Scholar
Barret Zoph and Kevin Knight. “Multi-Source Neural Translation”. In: CoRR abs/1601.00710 (2016).
Google Scholar

Download references

Author information

Authors and Affiliations

Digital Reasoning Systems Inc., McLean, VA, USA
Uday Kamath & James Whitaker
Intelluron Corporation, Nashville, TN, USA
John Liu

Authors

Uday Kamath
View author publications
You can also search for this author in PubMed Google Scholar
John Liu
View author publications
You can also search for this author in PubMed Google Scholar
James Whitaker
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kamath, U., Liu, J., Whitaker, J. (2019). Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-14596-5_10
Published: 11 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14595-8
Online ISBN: 978-3-030-14596-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics