Skip to main content

Transductive Learning with String Kernels for Cross-Domain Text Classification

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Included in the following conference series:

Abstract

For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bhatt, S.H., Semwal, D., Roy, S.: An iterative similarity based adaptation technique for cross-domain text classification. In: Proceedings of CONLL, pp. 52–61 (2015)

    Google Scholar 

  2. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boomboxes and blenders: domain adaptation for sentiment classification. In: Proceedings of ACL, pp. 187–205 (2007)

    Google Scholar 

  3. Bollegala, D., Weir, D., Carroll, J.: Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans. Knowl. Data Eng. 25(8), 1719–1731 (2013)

    Article  Google Scholar 

  4. Butnaru, A.M., Ionescu, R.T.: UnibucKernel reloaded: first place in Arabic dialect identification for the second year in a row. In: Proceedings of VarDial Workshop of COLING, pp. 77–87 (2018)

    Google Scholar 

  5. Ceci, M.: Hierarchical text categorization in a transductive setting. In: Proceedings of ICDMW, pp. 184–191, December 2008

    Google Scholar 

  6. Chang, W.C., Wu, Y., Liu, H., Yang, Y.: Cross-domain kernel induction for transfer learning. In: Proceedings of AAAI, pp. 1763–1769, February 2017

    Google Scholar 

  7. Cozma, M., Butnaru, A., Ionescu, R.T.: Automated essay scoring with string kernels and word embeddings. In: Proceedings of ACL, pp. 503–509 (2018)

    Google Scholar 

  8. Daumé III, H.: Frustratingly easy domain adaptation. In: Proceedings of ACL, pp. 256–263 (2007)

    Google Scholar 

  9. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)

    Article  Google Scholar 

  10. Escalante, H.J., Solorio, T., Montes-y-Gómez, M.: Local histograms of character n-grams for authorship attribution. In: Proceedings of ACL: HLT, vol. 1, pp. 288–298 (2011)

    Google Scholar 

  11. Fernández, A.M., Esuli, A., Sebastiani, F.: Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. J. Artif. Intell. Res. 55(1), 131–163 (2016)

    Article  MathSciNet  Google Scholar 

  12. Franco-Salvador, M., Cruz, F.L., Troyano, J.A., Rosso, P.: Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowl. Based Syst. 86, 46–56 (2015)

    Article  Google Scholar 

  13. Giménez-Pérez, R.M., Franco-Salvador, M., Rosso, P.: Single and cross-domain polarity classification using string kernels. In: Proceedings of EACL, pp. 558–563, April 2017

    Google Scholar 

  14. Guo, Y., Xiao, M.: Transductive representation learning for cross-lingual text classification. In: Proceedings of ICDM, pp. 888–893, December 2012

    Google Scholar 

  15. Huang, X., Rao, Y., Xie, H., Wong, T.L., Wang, F.L.: Cross-domain sentiment classification via topic-related TrAdaBoost. In: Proceedings of AAAI, pp. 4939–4940 (2017)

    Google Scholar 

  16. Ifrim, G., Weikum, G.: Transductive learning for text classification using explicit knowledge models. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 223–234. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_24

    Chapter  Google Scholar 

  17. Ionescu, R.T.: A fast algorithm for local rank distance: application to arabic native language identification. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9490, pp. 390–400. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26535-3_45

    Chapter  Google Scholar 

  18. Ionescu, R.T., Butnaru, A.: Learning to identify arabic and german dialects using multiple kernels. In: Proceedings of VarDial Workshop of EACL, pp. 200–209 (2017)

    Google Scholar 

  19. Ionescu, R.T., Butnaru, A.M.: Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set. In: Proceedings of EMNLP (2018)

    Google Scholar 

  20. Ionescu, R.T., Popescu, M.: Native language identification with string kernels. In: Ionescu, R.T., Popescu, M. (eds.) Knowledge Transfer between Computer Vision and Text Mining. ACVPR, pp. 193–227. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30367-3_8

    Chapter  Google Scholar 

  21. Ionescu, R.T., Popescu, M.: UnibucKernel: an approach for Arabic dialect identification based on multiple string kernels. In: Proceedings of VarDial Workshop of COLING, pp. 135–144 (2016)

    Google Scholar 

  22. Ionescu, R.T., Popescu, M.: Can string kernels pass the test of time in native language identification? In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 224–234 (2017)

    Google Scholar 

  23. Ionescu, R.T., Popescu, M., Cahill, A.: Can characters reveal your native language? A language-independent approach to native language identification. In: Proceedings of EMNLP, pp. 1363–1373, October 2014

    Google Scholar 

  24. Ionescu, R.T., Popescu, M., Cahill, A.: String kernels for native language identification: insights from behind the curtains. Comput. Linguist. 42(3), 491–525 (2016)

    Article  MathSciNet  Google Scholar 

  25. Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of ICML, pp. 200–209 (1999)

    Google Scholar 

  26. Li, T., Sindhwani, V., Ding, C., Zhang, Y.: Knowledge transformation for cross-domain sentiment classification. In: Proceedings of SIGIR, pp. 716–717 (2009)

    Google Scholar 

  27. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)

    MATH  Google Scholar 

  28. Long, M., Wang, J., Ding, G., Pan, S.J., Yu, P.S.: Adaptation regularization: a general framework for transfer learning. IEEE Trans. Knowl. Data Eng. 26(5), 1076–1089 (2014)

    Article  Google Scholar 

  29. Lui, M., Baldwin, T.: Cross-domain feature selection for language identification. In: Proceedings of IJCNLP, pp. 553–561 (2011)

    Google Scholar 

  30. Luo, K.H., Deng, Z.H., Yu, H., Wei, L.C.: JEAM: a novel model for cross-domain sentiment classification based on emotion analysis. In: Proceedings of EMNLP, pp. 2503–2508 (2015)

    Google Scholar 

  31. Nelakurthi, A.R., Tong, H., Maciejewski, R., Bliss, N., He, J.: User-guided cross-domain sentiment classification. In: Proceedings of SDM (2017)

    Google Scholar 

  32. Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW, pp. 751–760 (2010)

    Google Scholar 

  33. Ponomareva, N., Thelwall, M.: Semi-supervised vs. cross-domain graphs for sentiment analysis. In: Proceedings of RANLP, pp. 571–578, September 2013

    Google Scholar 

  34. Popescu, M., Grozea, C.: Kernel methods and string kernels for authorship analysis. In: Proceedings of CLEF (Online Working Notes/Labs/Workshop), September 2012

    Google Scholar 

  35. Popescu, M., Grozea, C., Ionescu, R.T.: HASKER: an efficient algorithm for string kernels. Application to polarity classification in various languages. In: Proceedings of KES, pp. 1755–1763 (2017)

    Article  Google Scholar 

  36. Popescu, M., Ionescu, R.T.: The story of the characters, the DNA and the native language. In: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 270–278, June 2013

    Google Scholar 

  37. Sener, O., Song, H.O., Saxena, A., Savarese, S.: Learning transferrable representations for unsupervised domain adaptation. In: Proceedings of NIPS, pp. 2110–2118 (2016)

    Google Scholar 

  38. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  39. Shu, L., Latecki, L.J.: Transductive domain adaptation with affinity learning. In: Proceedings of CIKM, pp. 1903–1906. ACM (2015)

    Google Scholar 

  40. Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of AAAI, pp. 2058–2065 (2016)

    Google Scholar 

  41. Zampieri, M., et al.: Findings of the VarDial evaluation campaign 2017. In: Proceedings of VarDial Workshop of EACL, pp. 1–15 (2017)

    Google Scholar 

  42. Zhuang, F., Luo, P., Yin, P., He, Q., Shi, Z.: Concept learning for cross-domain text classification: a general probabilistic framework. In: Proceedings of IJCAI, pp. 1960–1966 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Radu Tudor Ionescu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ionescu, R.T., Butnaru, A.M. (2018). Transductive Learning with String Kernels for Cross-Domain Text Classification. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04182-3_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04181-6

  • Online ISBN: 978-3-030-04182-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics