Advertisement

A New Pairwise Ensemble Approach for Text Classification

  • Yan Liu
  • Jaime Carbonell
  • Rong Jin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)

Abstract

Text classification, whether by topic or genre, is an important task that contributes to text extraction, retrieval, summarization and question answering. In this paper we present a new pairwise ensemble approach, which uses pairwise Support Vector Machine (SVM) classifiers as base classifiers and “input-dependent latent variable” method for model combination. This new approach better captures the characteristics of genre classification, including its heterogeneous nature. Our experiments on two multi-genre collections and one topic-based classification datasets show that the pairwise ensemble method outperforms both boosting, which has been demonstrated as a powerful ensemble approach, and Error-Correcting Output Codes (ECOC), which applies pairwise-like classifiers for multiclass classification problems.

Keywords

Support Vector Machine Text Categorization Ensemble Approach Latent Variable Approach Hierarchical Mixture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report (1998)Google Scholar
  2. 2.
    Bennett, P.N., Dumais, S.T., Horvitz, E.: Probabilistic combination of text classifiers using reliability indicators: Models and results. In: SIGIR 2002 (2002)Google Scholar
  3. 3.
    Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999: Workshop on machine learning for information filtering (1999)Google Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Buckley, C., Salton, G., Allan, J.: The effect of adding relevance information in a relevance feedback environment. In: SIGIR 1994, pp. 292–300 (1994)Google Scholar
  6. 6.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)zbMATHGoogle Scholar
  7. 7.
    Finn, A., Kushmerick, N., Smyth, B.: Genre classification and domain transfer for information filtering. In: Proceedings of ECIR 2002 (2002)Google Scholar
  8. 8.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: European Conference on Computational Learning Theory, pp. 23–37 (1995)Google Scholar
  9. 9.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156 (1996)Google Scholar
  10. 10.
    Fürnkranz, J.: Round robin rule learning. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), pp. 146–153 (2001)Google Scholar
  11. 11.
    Ghani, R.: Using error-correcting codes for text classification. In: Proceedings of 17th International Conference on Machine Learning, pp. 303–310 (2000)Google Scholar
  12. 12.
    Giorgetti, D., Sebastiani, F.: Multiclass text categorization for automated survey coding. In: ACM Symposium on Applied Computing, pp. 798–802 (2003)Google Scholar
  13. 13.
    Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Advances in Neural Information Processing Systems, vol. 10, The MIT Press, Cambridge (1998)Google Scholar
  14. 14.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  15. 15.
    Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Computation 6, 181–214 (1994)CrossRefGoogle Scholar
  16. 16.
    Kessler, B., Nunberg, G., Schütze, H.: Automatic detection of text genre. In: Proceedings of the Thirty-Fifth ACL and EACL, pp. 32–38 (1997)Google Scholar
  17. 17.
    Liu, Y., Yang, Y., Carbonell, J.: Boosting to correct the inductive bias for text classification. In: Proc. of CIKM 2002 (2002)Google Scholar
  18. 18.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)Google Scholar
  19. 19.
    Quinlan, J.R.: Bagging, boosting, and c4.5. In: Proceedings of the 13th National Conference on Artifitial Intelligence on Machine Learning, pp. 322–330 (1996)Google Scholar
  20. 20.
    Rennie, J.: Improving multi-class text classification with support vector machine. Master’s thesis, Massachusetts Institute of Technology (2001)Google Scholar
  21. 21.
    Schapire, R., Singer, Y.: Boosttexter: Aboosting-based system for text categorization. Machine Learning 39(1/3), 135–168 (2000)zbMATHCrossRefGoogle Scholar
  22. 22.
    Toutanova, K., Chen, F., Popat, K., Hofmann, T.: Text classification in a hierarchical mixture model for small training sets. In: Proc. of CIKM 2001 (2001)Google Scholar
  23. 23.
    Wolpert, D.: Stacked generalization. Neural Networks, 241–259 (1992)Google Scholar
  24. 24.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)Google Scholar
  25. 25.
    Yang, Y., Carbonell, J., Brown, R., Lafferty, J., Pierce, T., Ault, T.: Multi-strategy learning for topic detection and tracking. In: TDT 1999 book, Kluwer Academic Press, Dordrecht (1999)Google Scholar
  26. 26.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR 1999, pp. 42–49 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Yan Liu
    • 1
  • Jaime Carbonell
    • 1
  • Rong Jin
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations