Skip to main content

Research on Text Categorization Based on a Weakly-Supervised Transfer Learning Method

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7182))

  • 1361 Accesses

Abstract

This paper presents a weakly-supervised transfer learning based text categorization method, which does not need to tag new training documents when facing classification tasks in new area. Instead, we can take use of the already tagged documents in other domains to accomplish the automatic categorization task. By extracting linguistic information such as part-of-speech, semantic, co-occurrence of keywords, we construct a domain-adaptive transfer knowledge base. Relation experiments show that, the presented method improved the performance of text categorization on traditional corpus, and our results were only about 5% lower than the baseline on cross-domain classification tasks. And thus we demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  2. Dai, W., Xue, G.-R., Yang, Q., Yu, Y.: Co-clustering based Classification for Out-of-domain Documents. In: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), San Jose, California, USA, August 12-15, pp. 210–219 (2007)

    Google Scholar 

  3. Xue, G.-R., Dai, W., Yang, Q., Yu, Y.: Topic-bridged PLSA for Cross-Domain Text Classification. In: Proceedings of the Thirty-first International ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR 2008), Singapore, July 20-24, pp. 627–634 (2008)

    Google Scholar 

  4. Ling, X., Dai, W., Xue, G.-R., Yang, Q., Yu, Y.: Spectral Domain-Transfer Learning. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), Las Vegas, Nevada, USA, August 24-27, pp. 488–496 (2008)

    Google Scholar 

  5. Dai, W., Yang, Q., Xue, G.-R., Yu, Y.: Self-taught Clustering. In: Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML 2008), Helsinki, Finland, July 5-9, pp. 200–207 (2008)

    Google Scholar 

  6. Dai, W., Chen, Y., Xue, G.-R., Yang, Q., Yu, Y.: Translated Learning: Transfer Learning across Different Feature Spaces. Advances in Neural Information Processing

    Google Scholar 

  7. Ling, X., Xue, G.-R., Dai, W., Jiang, Y., Yang, Q., Yu, Y.: Can Chinese Web Pages be Classified with English Data Source? In: Proceedings the Seventeenth International World Wide Web Conference (WWW 2008), Beijing, China, April 21-25, pp. 969–978 (2008)

    Google Scholar 

  8. Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 120–128 (2006)

    Google Scholar 

  9. Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  10. Lewis, D.D.: Naïve(Bayes) at forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Yang, Y.M., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrival, Berkeley, CA, USA, pp. 42–49 (August 1999)

    Google Scholar 

  12. Han, E., Karypis, G.: Centroid-Based Document Classification Analysis & Experimental Result. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  13. Yang, Y.M.: An evaluation of statistical approaches to text categorization. Information Retrieval 1(1), 76–88 (1999)

    Article  Google Scholar 

  14. He, J., Tan, A.H., Tan, C.L.: A Comparative Study on Chinese Text Categorization Methods. In: PRICAL 2000 Workshop on Text and Web Mining, Melbourne, pp. 24–35 (August 2000)

    Google Scholar 

  15. Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: Proceedings of the IJCAI 1999 Workshop on Information Filtering, Stockholm, Sweden (1999)

    Google Scholar 

  16. Wiener, E.: A neural network approach to topic spotting. In: Proceedings of the 4th Annual Symopsium on Document Analysis and Information Retrieval (SDAIR 1995), Las Vegas, NV (1995)

    Google Scholar 

  17. Apte, C., Damerau, P., Weiss, S.: Text mining with decision rules and decision trees. In: Proceedings of the Conference on Automated Learning and Discovery Workshop 6: Learning from Text and the Web (1998)

    Google Scholar 

  18. Lent, B., Swami, A., Widom, J.: Clustering association rules. In: Proceedings of the Thirteenth International Conference on Data Engineering (ICDE 1997), Birmingham, England (1997)

    Google Scholar 

  19. Tan, S., Wang, Y.: Chinese Text Categorization Corpus-TanCorpV1.0., http://www.searchforum.org.cn/tansongbo/corpus.html

  20. Tan, S., et al.: A Novel Refinement Approach for Text Categorization. In: ACM CIKM (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zheng, D., Zhang, C., Fei, G., Zhao, T. (2012). Research on Text Categorization Based on a Weakly-Supervised Transfer Learning Method. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28601-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28600-1

  • Online ISBN: 978-3-642-28601-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics