Advertisement

Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning

  • Uday Kamath
  • John Liu
  • James Whitaker
Chapter

Abstract

Most supervised machine learning techniques, such as classification, rely on some underlying assumptions, such as: (a) the data distributions during training and prediction time are similar; (b) the label space during training and prediction time are similar; and (c) the feature space between the training and prediction time remains the same. In many real-world scenarios, these assumptions do not hold due to the changing nature of the data.

References

  1. [AS17]
    Isabelle Augenstein and Anders Søgaard. “Multi-Task Learning of Keyphrase Boundary Classification”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 341–346.Google Scholar
  2. [BMA17]
    Georgios Balikas, Simon Moura, and Massih-Reza Amini. “Multitask Learning for Fine-Grained Twitter Sentiment Analysis”. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, pp. 1005–1008.Google Scholar
  3. [BBS07]
    Steffen Bickel, Michael Brückner, and Tobias Scheffer. “Discriminative Learning for Differing Training and Test Distributions”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 81–88.Google Scholar
  4. [Car97]
    Rich Caruana. “Multitask Learning”. In: Machine Learning 28.1 (1997), pp. 41–75.MathSciNetCrossRefGoogle Scholar
  5. [Car93]
    Richard Caruana. “Multitask Learning: A Knowledge-Based Source of Inductive Bias”. In: Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 1993, pp. 41–48.Google Scholar
  6. [Cho+17]
    Eunsol Choi et al. “Coarse-to-Fine Question Answering for Long Documents”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 209–220.Google Scholar
  7. [CW08]
    Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multi-task Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08. 2008, pp. 160–167.Google Scholar
  8. [Dah+12]
    George E. Dahl et al. “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”. In: IEEE Trans. Audio, Speech & Language Processing 20.1 (2012), pp. 30–42.Google Scholar
  9. [DL15]
    Andrew M Dai and Quoc V Le. “Semi-supervised Sequence Learning”. In: Advances in Neural Information Processing Systems 28. Ed. by C. Cortes et al. 2015, pp. 3079–3087.Google Scholar
  10. [Dai+07a]
    Wenyuan Dai et al. “Boosting for Transfer Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 193–200.Google Scholar
  11. [Dai+07b]
    Wenyuan Dai et al. “Transferring Naive Bayes Classifiers for Text Classification”. In: Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 1. AAAI’07. 2007, pp. 540–545.Google Scholar
  12. [DM06]
    Hal Daumé III and Daniel Marcu. “Domain Adaptation for Statistical Classifiers”. In: J. Artif. Int. Res. 26.1 (May 2006), pp. 101–126.Google Scholar
  13. [Die+16]
    Adji B. Dieng et al. “TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency.” In: CoRR abs/1611.01702 (2016).Google Scholar
  14. [Don+15]
    Daxiang Dong et al. “Multi-Task Learning for Multiple Language Translation.” In: ACL (1). 2015, pp. 1723–1732.Google Scholar
  15. [Duo+15]
    Long Duong et al. “Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser”. In: Proceedings of the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015, pp. 845–850.Google Scholar
  16. [Erh+10]
    Dumitru Erhan et al. “Why Does Unsupervised Pre-training Help Deep Learning?” In: J. Mach. Learn. Res. 11 (Mar. 2010).Google Scholar
  17. [FC17]
    Meng Fang and Trevor Cohn. “Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary”. In: CoRR abs/1705.00424 (2017).Google Scholar
  18. [Fun+06]
    Gabriel Pui Cheong Fung et al. “Text Classification Without Negative Examples Revisit”. In: IEEE Trans. on Knowl. and Data Eng. 18.1 (Jan. 2006), pp. 6–20.Google Scholar
  19. [Has+16]
    Kazuma Hashimoto et al. “A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks”. In: CoRR abs/1611.01587 (2016).Google Scholar
  20. [Hin+12]
    Geoffrey Hinton et al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition”. In: Signal Processing Magazine (2012).Google Scholar
  21. [Iso+17]
    Masaru Isonuma et al. “Extractive Summarization Using Multi-Task Learning with Document Classification”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017. 2017, pp. 2101–2110.Google Scholar
  22. [Jia09]
    Jing Jiang. “Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction”. In: ACL 2009, Proceedings of the 4th International Joint Conference on Natural Language Processing of the AFNL. 2009, pp. 1012–1020.Google Scholar
  23. [JZ07]
    Jing Jiang and Chengxiang Zhai. “Instance weighting for domain adaptation in NLP”. In: In ACL 2007. 2007, pp. 264–271.Google Scholar
  24. [Joh+16]
    Melvin Johnson et al. “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”. In: CoRR abs/1611.04558 (2016).Google Scholar
  25. [KC17]
    Arzoo Katiyar and Claire Cardie. “Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 917–928.Google Scholar
  26. [Lee+09]
    Honglak Lee et al. “Unsupervised feature learning for audio classification using convolutional deep belief networks”. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems. 2009, pp. 1096–1104.Google Scholar
  27. [Liu+15]
    Xiaodong Liu et al. “Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval”. In: NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics.Google Scholar
  28. [LW15]
    Mingsheng Long and Jianmin Wang. “Learning Multiple Tasks with Deep Relationship Networks”. In: CoRR abs/1506.02117 (2015).Google Scholar
  29. [Lu+16]
    Yongxi Lu et al. “Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification”. In: CoRR abs/1611.05377 (2016).Google Scholar
  30. [Luo+17]
    Bingfeng Luo et al. “Learning to Predict Charges for Criminal Cases with Legal Basis”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, 2017, pp. 2727–2736.Google Scholar
  31. [Luo+15]
    Minh-Thang Luong et al. “Multi-task Sequence to Sequence Learning”. In: CoRR abs/1511.06114 (2015).Google Scholar
  32. [Mis+16]
    Ishan Misra et al. “Cross-stitch Networks for Multi-task Learning”. In: CoRR abs/1604.03539 (2016).Google Scholar
  33. [NC17]
    Jan Niehues and Eunah Cho. “Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning”. In: Proceedings of the Second Conference on Machine Translation. Association for Computational Linguistics, 2017, pp. 80–89.Google Scholar
  34. [PY10]
    Sinno Jialin Pan and Qiang Yang. “A Survey on Transfer Learning”. In: IEEE Trans. on Knowl. and Data Eng. 22.10 (Oct. 2010), pp. 1345–1359.Google Scholar
  35. [Pan+08]
    Sinno Jialin Pan et al. “Transfer Learning for WiFi-based Indoor Localization”. In: 2008.Google Scholar
  36. [Rai+07]
    Rajat Raina et al. “Self-taught Learning: Transfer Learning from Unlabeled Data”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 759–766.Google Scholar
  37. [RLL17]
    Prajit Ramachandran, Peter J. Liu, and Quoc V. Le. “Unsupervised Pretraining for Sequence to Sequence Learning”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9–11, 2017. 2017, pp. 383–391.Google Scholar
  38. [Rei17]
    Marek Rei. “Semi-supervised Multitask Learning for Sequence Labeling”. In: CoRR abs/1704.07156 (2017).Google Scholar
  39. [Rud17]
    Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017).Google Scholar
  40. [SG16]
    Anders Søgaard and Yoav Goldberg. “Deep multi-task learning with low level tasks supervised at lower layers”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2: Short Papers. 2016.Google Scholar
  41. [TS07]
    Matthew E. Taylor and Peter Stone. “Cross-domain Transfer for Reinforcement Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 879–886.Google Scholar
  42. [Dar05]
    “Transfer Learning Proposer Information Pamphlet (PIP) for Broad Agency Announcement”. In: Defense Advanced Research Projects Agency (DARPA), 2005.Google Scholar
  43. [TRB10]
    Joseph Turian, Lev Ratinov, and Yoshua Bengio. “Word Representations: A Simple and General Method for Semi-supervised Learning”. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. ACL ’10. 2010.Google Scholar
  44. [Wan+18]
    Shuohang Wang et al. “R3: Reinforced Ranker-Reader for Open-Domain Question Answering”. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 2018.Google Scholar
  45. [WSZ08]
    Zheng Wang, Yangqiu Song, and Changshui Zhang. “Transferred Dimensionality Reduction”. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Part II. ECML PKDD ’08. 2008, pp. 550–565.Google Scholar
  46. [WHH17]
    Shinji Watanabe, Takaaki Hori, and John R. Hershey. “Language independent end-to-end architecture for joint language identification and speech recognition”. In: ASRU. IEEE, 2017, pp. 265–271.Google Scholar
  47. [Wat+17]
    Shinji Watanabe et al. “Hybrid CTC/Attention Architecture for End-to-End Speech Recognition”. In: J. Sel. Topics Signal Processing 11.8 (2017), pp. 1240–1253.Google Scholar
  48. [Wat+18]
    Shinji Watanabe et al. “A Purely End-to-End System for Multi-speaker Speech Recognition”. In: ACL (1). Association for Computational Linguistics, 2018, pp. 2620–2630.Google Scholar
  49. [YM17]
    Bishan Yang and Tom M. Mitchell. “A Joint Sequential and Relational Model for Frame-Semantic Parsing”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing EMNLP 2017. 2017, pp. 1247–1256.Google Scholar
  50. [YSC16]
    Zhilin Yang, Ruslan Salakhutdinov, and William W. Cohen. “Multi-Task Cross-Lingual Sequence Tagging from Scratch”. In: CoRR abs/1603.06270 (2016).Google Scholar
  51. [ZK16]
    Barret Zoph and Kevin Knight. “Multi-Source Neural Translation”. In: CoRR abs/1601.00710 (2016).Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Uday Kamath
    • 1
  • John Liu
    • 2
  • James Whitaker
    • 1
  1. 1.Digital Reasoning Systems Inc.McLeanUSA
  2. 2.Intelluron CorporationNashvilleUSA

Personalised recommendations