Advertisement

Transfer Learning for Cross-Domain Sequence Tagging Tasks

  • Meng Cao
  • Chaohe Zhang
  • Dancheng LiEmail author
  • Qingping Zheng
  • Ling Luo
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 70)

Abstract

Neural network has been proved to be effective in sequence annotation task. Since it does not require task-specific knowledge, the same network structure can be easily applied to a wide range of applications. However, domain sequence tagging tasks still suffer from lack of available data. First, there is fewer available domain annotated data to train the recurrent neural network adequately. Second, the corpus maybe not available for domain-specific word embedding training. In this paper, we explore the problem of transfer learning of domain name entity recognition task. We proposed a modified skip-gram model for training cross-domain word embeddings, and we use source task with a large number of annotations (e.g. NER on CoNLL2003) to improve the performance on target task with fewer available annotations (e.g. NER on biomedical dataset). We evaluate our approach on a range of sequence tagging benchmarks, and the results show that significant improvement can be achieved using our approach.

Keywords

Sequence tagging Transfer learning Word embeddings 

Notes

Acknowledgements

We would like to thank Prof. Li from Northeast University, China and Dr. Zheng from IBM Innovation Lab, without whose help, our work could not be finished so smoothly. We also thank all the reviewers for their useful feedback to the earlier draft of this paper and the anonymous reviewers for their constructive comments to revise the paper.

References

  1. 1.
    Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6(Nov), 1817–1853 (2005)Google Scholar
  2. 2.
    Bollegala, D., Maehara, T., Kawarabayashi, K.i.: Unsupervised cross-domain word representation learning. arXiv preprint arXiv:1505.07184 (2015)
  3. 3.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)Google Scholar
  4. 4.
    Finkel, J.R., Manning, C.D.: Hierarchical bayesian domain adaptation. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. pp. 602–610. Association for Computational Linguistics (2009)Google Scholar
  5. 5.
    Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11). pp. 513–520 (2011)Google Scholar
  6. 6.
    Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: Genia corpusa semantically annotated corpus for bio-textmining. Bioinformatics 19(\({\rm suppl\_1}\)), i180–i182 (2003)CrossRefGoogle Scholar
  7. 7.
    Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at jnlpba. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. pp. 70–75. Association for Computational Linguistics (2004)Google Scholar
  8. 8.
    Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: Chemdner: the drugs and chemical names extraction challenge. J. Cheminformatics 7(1), S1 (2015)CrossRefGoogle Scholar
  9. 9.
    Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
  10. 10.
    McClosky, D., Charniak, E., Johnson, M.: Automatic domain adaptation for parsing. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. pp. 28–36. Association for Computational Linguistics (2010)Google Scholar
  11. 11.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
  12. 12.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp. 3111–3119 (2013)Google Scholar
  13. 13.
    Rau, L.F.: Extracting company names from text. In: Artificial Intelligence Applications, 1991. In: Seventh IEEE Conference on Proceedings, vol. 1, pp. 29–32. IEEE (1991)Google Scholar
  14. 14.
    Rei, M., Crichton, G.K., Pyysalo, S.: Attending to characters in neural sequence labeling models. arXiv:1611.04361 (2016)
  15. 15.
    Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in natural Language Processing. pp. 1524–1534. Association for Computational Linguistics (2011)Google Scholar
  16. 16.
    Schnabel, T., Schütze, H.: Flors: Fast and simple domain adaptation for part-of-speech tagging. Trans. Assoc. Comput. Linguist. 2, 15–26 (2014)CrossRefGoogle Scholar
  17. 17.
    Sienčnik, S.K.: Adapting word2vec to named entity recognition. In: Proceedings of the 20th nordic conference of computational linguistics, nodalida 2015, may 11-13, 2015, vilnius, lithuania. pp. 239–243. No. 109, Linköping University Electronic Press (2015)Google Scholar
  18. 18.
    Smith, L., Tanabe, L.K., nee Ando, R.J., Kuo, C.J., Chung, I.F., Hsu, C.N., Lin, Y.S., Klinger, R., Friedrich, C.M., Ganchev, K., et al.: Overview of biocreative ii gene mention recognition. Genome Biol. 9(2), S2 (2008)CrossRefGoogle Scholar
  19. 19.
    Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Proceedings of the first international conference on Human language technology research. pp. 1–8. Association for Computational Linguistics (2001)Google Scholar
  20. 20.
    Zirikly, A., Hagiwara, M.: Cross-lingual transfer of named entity recognizers without parallel corpora. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). vol. 2, pp. 390–396 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Meng Cao
    • 1
  • Chaohe Zhang
    • 1
  • Dancheng Li
    • 1
    Email author
  • Qingping Zheng
    • 2
  • Ling Luo
    • 2
  1. 1.Northeastern UniversityShenyangChina
  2. 2.IBM China Development LabBeijingChina

Personalised recommendations