Abstract
Recently the sentiment classification problem interests the researchers over the world, but most sentiment corpora are in English, which limits the research progress on sentiment classification in other languages. Cross-lingual sentiment classification aims to use annotated sentiment corpora in one language (e.g. English) as training data, to predict the sentiment polarity of the data in another language (e.g. Chinese). In this paper, we design a bi-view non-negative matrix tri-factorization (BNMTF) model for the cross-lingual sentiment classification problem. We employ machine translation service so that both training and test data is able to have two representation, one in source language and the other in target language. Our BNMTF model is derived from the non-negative matrix tri-factorization models in both languages in order to make more accurate prediction. Our BNMTF model has three main advantages: (1) combining the information from two views (2) incorporating the lexical knowledge and training document label knowledge (3) adding information from test documents. Experimental results show the effectiveness of our BNMTF model, which can outperform other baseline approaches to cross-lingual sentiment classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Banea, C., Mihalcea, R., Wiebe, J., Hassan, S.: Multilingual subjectivity analysis using machine translation. In: EMNLP, pp. 127–135. Association for Computational Linguistics, Morristown (2008)
Bel, N., Koster, C.H.A., Villegas, M.: Cross-lingual text categorization. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 126–139. Springer, Heidelberg (2003)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL, pp. 440–447. Association for Computational Linguistics, Morristown (2007)
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix tri-factorizations for clustering. In: KDD, pp. 126–135. ACM, New York (2006)
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR, pp. 50–57. ACM, New York (1999)
Kim, S.M., Hovy, E.: Determining the sentiment of opinions. In: COLING, p. 1367. Association for Computational Linguistics, Morristown (2004)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, The MIT Press, Cambridge (2000)
Li, T., Zhang, Y., Sindhwani, V.: A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: ACL, pp. 244–252. Association for Computational Linguistics, Morristown (2009)
Ling, X., Xue, G.R., Dai, W., Jiang, Y., Yang, Q., Yu, Y.: Can Chinese web pages be classified with English data source? In: WWW, pp. 969–978. ACM, New York (2008)
Mihalcea, R., Banea, C., Wiebe, J.: Learning multilingual subjective language via cross-lingual projections. In: ACL, pp. 976–983. Association for Computational Linguistics, Morristown (2007)
Olsson, J.S., Oard, D.W., Hajič, J.: Cross-language text classification. In: SIGIR, pp. 645–646. ACM, New York (2005)
Pan, S.J., Ni, X., Sun, J.t., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: WWW, pp. 751–760. ACM, New York (2010)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: EMNLP, pp. 79–86. Association for Computational Linguistics, Morristown (2002)
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: ACL, pp. 417–424. Association for Computational Linguistics, Morristown (2002)
Wan, X.: Using bilingual knowledge and ensemble techniques for unsupervised chinese sentiment analysis. In: EMNLP, pp. 553–561. Association for Computational Linguistics, Morristown (2008)
Wan, X.: Co-Training for cross-lingual sentiment classification. In: ACL, pp. 235–243. Association for Computational Linguistics, Morristown (2009)
Wang, X., Broder, A., Gabrilovich, E., Josifovski, V., Pang, B.: Cross-language query classification using web search for exogenous knowledge. In: WSDM, pp. 74–83. ACM, New York (2009)
Yogatama, D., Tanaka-Ishii, K.: Multilingual spectral clustering using document similarity propagation. In: EMNLP, pp. 871–879. Association for Computational Linguistics, Morristown (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pan, J., Xue, GR., Yu, Y., Wang, Y. (2011). Cross-Lingual Sentiment Classification via Bi-view Non-negative Matrix Tri-Factorization. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-20841-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20840-9
Online ISBN: 978-3-642-20841-6
eBook Packages: Computer ScienceComputer Science (R0)