Abstract
For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhatt, S.H., Semwal, D., Roy, S.: An iterative similarity based adaptation technique for cross-domain text classification. In: Proceedings of CONLL, pp. 52–61 (2015)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boomboxes and blenders: domain adaptation for sentiment classification. In: Proceedings of ACL, pp. 187–205 (2007)
Bollegala, D., Weir, D., Carroll, J.: Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans. Knowl. Data Eng. 25(8), 1719–1731 (2013)
Butnaru, A.M., Ionescu, R.T.: UnibucKernel reloaded: first place in Arabic dialect identification for the second year in a row. In: Proceedings of VarDial Workshop of COLING, pp. 77–87 (2018)
Ceci, M.: Hierarchical text categorization in a transductive setting. In: Proceedings of ICDMW, pp. 184–191, December 2008
Chang, W.C., Wu, Y., Liu, H., Yang, Y.: Cross-domain kernel induction for transfer learning. In: Proceedings of AAAI, pp. 1763–1769, February 2017
Cozma, M., Butnaru, A., Ionescu, R.T.: Automated essay scoring with string kernels and word embeddings. In: Proceedings of ACL, pp. 503–509 (2018)
Daumé III, H.: Frustratingly easy domain adaptation. In: Proceedings of ACL, pp. 256–263 (2007)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)
Escalante, H.J., Solorio, T., Montes-y-Gómez, M.: Local histograms of character n-grams for authorship attribution. In: Proceedings of ACL: HLT, vol. 1, pp. 288–298 (2011)
Fernández, A.M., Esuli, A., Sebastiani, F.: Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. J. Artif. Intell. Res. 55(1), 131–163 (2016)
Franco-Salvador, M., Cruz, F.L., Troyano, J.A., Rosso, P.: Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowl. Based Syst. 86, 46–56 (2015)
Giménez-Pérez, R.M., Franco-Salvador, M., Rosso, P.: Single and cross-domain polarity classification using string kernels. In: Proceedings of EACL, pp. 558–563, April 2017
Guo, Y., Xiao, M.: Transductive representation learning for cross-lingual text classification. In: Proceedings of ICDM, pp. 888–893, December 2012
Huang, X., Rao, Y., Xie, H., Wong, T.L., Wang, F.L.: Cross-domain sentiment classification via topic-related TrAdaBoost. In: Proceedings of AAAI, pp. 4939–4940 (2017)
Ifrim, G., Weikum, G.: Transductive learning for text classification using explicit knowledge models. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 223–234. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_24
Ionescu, R.T.: A fast algorithm for local rank distance: application to arabic native language identification. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9490, pp. 390–400. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26535-3_45
Ionescu, R.T., Butnaru, A.: Learning to identify arabic and german dialects using multiple kernels. In: Proceedings of VarDial Workshop of EACL, pp. 200–209 (2017)
Ionescu, R.T., Butnaru, A.M.: Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set. In: Proceedings of EMNLP (2018)
Ionescu, R.T., Popescu, M.: Native language identification with string kernels. In: Ionescu, R.T., Popescu, M. (eds.) Knowledge Transfer between Computer Vision and Text Mining. ACVPR, pp. 193–227. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30367-3_8
Ionescu, R.T., Popescu, M.: UnibucKernel: an approach for Arabic dialect identification based on multiple string kernels. In: Proceedings of VarDial Workshop of COLING, pp. 135–144 (2016)
Ionescu, R.T., Popescu, M.: Can string kernels pass the test of time in native language identification? In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 224–234 (2017)
Ionescu, R.T., Popescu, M., Cahill, A.: Can characters reveal your native language? A language-independent approach to native language identification. In: Proceedings of EMNLP, pp. 1363–1373, October 2014
Ionescu, R.T., Popescu, M., Cahill, A.: String kernels for native language identification: insights from behind the curtains. Comput. Linguist. 42(3), 491–525 (2016)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of ICML, pp. 200–209 (1999)
Li, T., Sindhwani, V., Ding, C., Zhang, Y.: Knowledge transformation for cross-domain sentiment classification. In: Proceedings of SIGIR, pp. 716–717 (2009)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
Long, M., Wang, J., Ding, G., Pan, S.J., Yu, P.S.: Adaptation regularization: a general framework for transfer learning. IEEE Trans. Knowl. Data Eng. 26(5), 1076–1089 (2014)
Lui, M., Baldwin, T.: Cross-domain feature selection for language identification. In: Proceedings of IJCNLP, pp. 553–561 (2011)
Luo, K.H., Deng, Z.H., Yu, H., Wei, L.C.: JEAM: a novel model for cross-domain sentiment classification based on emotion analysis. In: Proceedings of EMNLP, pp. 2503–2508 (2015)
Nelakurthi, A.R., Tong, H., Maciejewski, R., Bliss, N., He, J.: User-guided cross-domain sentiment classification. In: Proceedings of SDM (2017)
Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW, pp. 751–760 (2010)
Ponomareva, N., Thelwall, M.: Semi-supervised vs. cross-domain graphs for sentiment analysis. In: Proceedings of RANLP, pp. 571–578, September 2013
Popescu, M., Grozea, C.: Kernel methods and string kernels for authorship analysis. In: Proceedings of CLEF (Online Working Notes/Labs/Workshop), September 2012
Popescu, M., Grozea, C., Ionescu, R.T.: HASKER: an efficient algorithm for string kernels. Application to polarity classification in various languages. In: Proceedings of KES, pp. 1755–1763 (2017)
Popescu, M., Ionescu, R.T.: The story of the characters, the DNA and the native language. In: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 270–278, June 2013
Sener, O., Song, H.O., Saxena, A., Savarese, S.: Learning transferrable representations for unsupervised domain adaptation. In: Proceedings of NIPS, pp. 2110–2118 (2016)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Shu, L., Latecki, L.J.: Transductive domain adaptation with affinity learning. In: Proceedings of CIKM, pp. 1903–1906. ACM (2015)
Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of AAAI, pp. 2058–2065 (2016)
Zampieri, M., et al.: Findings of the VarDial evaluation campaign 2017. In: Proceedings of VarDial Workshop of EACL, pp. 1–15 (2017)
Zhuang, F., Luo, P., Yin, P., He, Q., Shi, Z.: Concept learning for cross-domain text classification: a general probabilistic framework. In: Proceedings of IJCAI, pp. 1960–1966 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ionescu, R.T., Butnaru, A.M. (2018). Transductive Learning with String Kernels for Cross-Domain Text Classification. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-04182-3_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)