Skip to main content

Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning

  • Chapter
  • First Online:
Deep Learning for NLP and Speech Recognition

Abstract

Most supervised machine learning techniques, such as classification, rely on some underlying assumptions, such as: (a) the data distributions during training and prediction time are similar; (b) the label space during training and prediction time are similar; and (c) the feature space between the training and prediction time remains the same. In many real-world scenarios, these assumptions do not hold due to the changing nature of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Isabelle Augenstein and Anders Søgaard. “Multi-Task Learning of Keyphrase Boundary Classification”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 341–346.

    Google Scholar 

  2. Georgios Balikas, Simon Moura, and Massih-Reza Amini. “Multitask Learning for Fine-Grained Twitter Sentiment Analysis”. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, pp. 1005–1008.

    Google Scholar 

  3. Steffen Bickel, Michael Brückner, and Tobias Scheffer. “Discriminative Learning for Differing Training and Test Distributions”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 81–88.

    Google Scholar 

  4. Rich Caruana. “Multitask Learning”. In: Machine Learning 28.1 (1997), pp. 41–75.

    Article  MathSciNet  Google Scholar 

  5. Richard Caruana. “Multitask Learning: A Knowledge-Based Source of Inductive Bias”. In: Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 1993, pp. 41–48.

    Google Scholar 

  6. Eunsol Choi et al. “Coarse-to-Fine Question Answering for Long Documents”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 209–220.

    Google Scholar 

  7. Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multi-task Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08. 2008, pp. 160–167.

    Google Scholar 

  8. George E. Dahl et al. “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”. In: IEEE Trans. Audio, Speech & Language Processing 20.1 (2012), pp. 30–42.

    Google Scholar 

  9. Andrew M Dai and Quoc V Le. “Semi-supervised Sequence Learning”. In: Advances in Neural Information Processing Systems 28. Ed. by C. Cortes et al. 2015, pp. 3079–3087.

    Google Scholar 

  10. Wenyuan Dai et al. “Boosting for Transfer Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 193–200.

    Google Scholar 

  11. Wenyuan Dai et al. “Transferring Naive Bayes Classifiers for Text Classification”. In: Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 1. AAAI’07. 2007, pp. 540–545.

    Google Scholar 

  12. Hal Daumé III and Daniel Marcu. “Domain Adaptation for Statistical Classifiers”. In: J. Artif. Int. Res. 26.1 (May 2006), pp. 101–126.

    Google Scholar 

  13. Adji B. Dieng et al. “TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency.” In: CoRR abs/1611.01702 (2016).

    Google Scholar 

  14. Daxiang Dong et al. “Multi-Task Learning for Multiple Language Translation.” In: ACL (1). 2015, pp. 1723–1732.

    Google Scholar 

  15. Long Duong et al. “Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser”. In: Proceedings of the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015, pp. 845–850.

    Google Scholar 

  16. Dumitru Erhan et al. “Why Does Unsupervised Pre-training Help Deep Learning?” In: J. Mach. Learn. Res. 11 (Mar. 2010).

    Google Scholar 

  17. Meng Fang and Trevor Cohn. “Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary”. In: CoRR abs/1705.00424 (2017).

    Google Scholar 

  18. Gabriel Pui Cheong Fung et al. “Text Classification Without Negative Examples Revisit”. In: IEEE Trans. on Knowl. and Data Eng. 18.1 (Jan. 2006), pp. 6–20.

    Google Scholar 

  19. Kazuma Hashimoto et al. “A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks”. In: CoRR abs/1611.01587 (2016).

    Google Scholar 

  20. Geoffrey Hinton et al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition”. In: Signal Processing Magazine (2012).

    Google Scholar 

  21. Masaru Isonuma et al. “Extractive Summarization Using Multi-Task Learning with Document Classification”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017. 2017, pp. 2101–2110.

    Google Scholar 

  22. Jing Jiang. “Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction”. In: ACL 2009, Proceedings of the 4th International Joint Conference on Natural Language Processing of the AFNL. 2009, pp. 1012–1020.

    Google Scholar 

  23. Jing Jiang and Chengxiang Zhai. “Instance weighting for domain adaptation in NLP”. In: In ACL 2007. 2007, pp. 264–271.

    Google Scholar 

  24. Melvin Johnson et al. “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”. In: CoRR abs/1611.04558 (2016).

    Google Scholar 

  25. Arzoo Katiyar and Claire Cardie. “Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 917–928.

    Google Scholar 

  26. Honglak Lee et al. “Unsupervised feature learning for audio classification using convolutional deep belief networks”. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems. 2009, pp. 1096–1104.

    Google Scholar 

  27. Xiaodong Liu et al. “Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval”. In: NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics.

    Google Scholar 

  28. Mingsheng Long and Jianmin Wang. “Learning Multiple Tasks with Deep Relationship Networks”. In: CoRR abs/1506.02117 (2015).

    Google Scholar 

  29. Yongxi Lu et al. “Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification”. In: CoRR abs/1611.05377 (2016).

    Google Scholar 

  30. Bingfeng Luo et al. “Learning to Predict Charges for Criminal Cases with Legal Basis”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, 2017, pp. 2727–2736.

    Google Scholar 

  31. Minh-Thang Luong et al. “Multi-task Sequence to Sequence Learning”. In: CoRR abs/1511.06114 (2015).

    Google Scholar 

  32. Ishan Misra et al. “Cross-stitch Networks for Multi-task Learning”. In: CoRR abs/1604.03539 (2016).

    Google Scholar 

  33. Jan Niehues and Eunah Cho. “Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning”. In: Proceedings of the Second Conference on Machine Translation. Association for Computational Linguistics, 2017, pp. 80–89.

    Google Scholar 

  34. Sinno Jialin Pan and Qiang Yang. “A Survey on Transfer Learning”. In: IEEE Trans. on Knowl. and Data Eng. 22.10 (Oct. 2010), pp. 1345–1359.

    Google Scholar 

  35. Sinno Jialin Pan et al. “Transfer Learning for WiFi-based Indoor Localization”. In: 2008.

    Google Scholar 

  36. Rajat Raina et al. “Self-taught Learning: Transfer Learning from Unlabeled Data”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 759–766.

    Google Scholar 

  37. Prajit Ramachandran, Peter J. Liu, and Quoc V. Le. “Unsupervised Pretraining for Sequence to Sequence Learning”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9–11, 2017. 2017, pp. 383–391.

    Google Scholar 

  38. Marek Rei. “Semi-supervised Multitask Learning for Sequence Labeling”. In: CoRR abs/1704.07156 (2017).

    Google Scholar 

  39. Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017).

    Google Scholar 

  40. Anders Søgaard and Yoav Goldberg. “Deep multi-task learning with low level tasks supervised at lower layers”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2: Short Papers. 2016.

    Google Scholar 

  41. Matthew E. Taylor and Peter Stone. “Cross-domain Transfer for Reinforcement Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 879–886.

    Google Scholar 

  42. “Transfer Learning Proposer Information Pamphlet (PIP) for Broad Agency Announcement”. In: Defense Advanced Research Projects Agency (DARPA), 2005.

    Google Scholar 

  43. Joseph Turian, Lev Ratinov, and Yoshua Bengio. “Word Representations: A Simple and General Method for Semi-supervised Learning”. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. ACL ’10. 2010.

    Google Scholar 

  44. Shuohang Wang et al. “R3: Reinforced Ranker-Reader for Open-Domain Question Answering”. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 2018.

    Google Scholar 

  45. Zheng Wang, Yangqiu Song, and Changshui Zhang. “Transferred Dimensionality Reduction”. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Part II. ECML PKDD ’08. 2008, pp. 550–565.

    Google Scholar 

  46. Shinji Watanabe, Takaaki Hori, and John R. Hershey. “Language independent end-to-end architecture for joint language identification and speech recognition”. In: ASRU. IEEE, 2017, pp. 265–271.

    Google Scholar 

  47. Shinji Watanabe et al. “Hybrid CTC/Attention Architecture for End-to-End Speech Recognition”. In: J. Sel. Topics Signal Processing 11.8 (2017), pp. 1240–1253.

    Google Scholar 

  48. Shinji Watanabe et al. “A Purely End-to-End System for Multi-speaker Speech Recognition”. In: ACL (1). Association for Computational Linguistics, 2018, pp. 2620–2630.

    Google Scholar 

  49. Bishan Yang and Tom M. Mitchell. “A Joint Sequential and Relational Model for Frame-Semantic Parsing”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing EMNLP 2017. 2017, pp. 1247–1256.

    Google Scholar 

  50. Zhilin Yang, Ruslan Salakhutdinov, and William W. Cohen. “Multi-Task Cross-Lingual Sequence Tagging from Scratch”. In: CoRR abs/1603.06270 (2016).

    Google Scholar 

  51. Barret Zoph and Kevin Knight. “Multi-Source Neural Translation”. In: CoRR abs/1601.00710 (2016).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kamath, U., Liu, J., Whitaker, J. (2019). Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning. In: Deep Learning for NLP and Speech Recognition . Springer, Cham. https://doi.org/10.1007/978-3-030-14596-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14596-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14595-8

  • Online ISBN: 978-3-030-14596-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics