Text Classification: Basic Models



In classification, the corpus is partitioned into classes that are typically defined by application-specific criteria. Therefore, training examples are provided that associate data points with labels indicating their class membership. For example, the training examples extracted from a news portal on political matters might attach one of three labels associated with each of the documents, such as “senate,” “congress,” and “legislation.


  1. [1]
    C. Aggarwal. Data classification: Algorithms and applications, CRC Press, 2014.Google Scholar
  2. [2]
    C. Aggarwal. Data mining: The textbook. Springer, 2015.Google Scholar
  3. [6]
    C. Aggarwal, S. Gates, and P. Yu. On using partial supervision for text categorization. IEEE Transactions on Knowledge and Data Engineering, 16(2), 245–255, 2004. [Extended version of ACM KDD 1998 paper “On the merits of building categorization systems by supervised clustering.”]CrossRefGoogle Scholar
  4. [12]
    C. Aggarwal and P. Yu. On effective conceptual indexing and similarity search in text data. ICDM Conference, pp. 3–10, 2001.Google Scholar
  5. [14]
    C. Aggarwal, and C. Zhai, Mining text data. Springer, 2012.Google Scholar
  6. [26]
    M. Antonie and O Zaïane. Text document categorization by term association. IEEE ICDM Conference, pp. 19–26, 2002.Google Scholar
  7. [27]
    C. Apte, F. Damerau, and S. Weiss. Automated learning of decision rules for text categorization, ACM Transactions on Information Systems, 12(3), pp. 233–251, 1994.CrossRefGoogle Scholar
  8. [28]
    C. Apte, F. Damerau, and S. Weiss. Text mining with decision rules and decision trees. Conference on Automated Learning and Discovery, Also appears as IBM Research Report, RC21219, 1998.Google Scholar
  9. [33]
    L. Baker and A. McCallum. Distributional clustering of words for text classification. ACM SIGIR Conference, pp. 96–103, 1998.Google Scholar
  10. [56]
    A. Blum, and T. Mitchell. Combining labeled and unlabeled data with co-training. COLT, 1998.Google Scholar
  11. [57]
    A. Blum and S. Chawla. Combining labeled and unlabeled data with graph mincuts. ICML Conference, 2001.Google Scholar
  12. [58]
    D. Boley, M. Gini, R. Gross, E.-H. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore. Partitioning-based clustering for Web document categorization. Decision Support Systems, Vol. 27, pp. 329–341, 1999.CrossRefGoogle Scholar
  13. [60]
    L. Breiman. Random forests. Journal Machine Learning archive, 45(1), pp. 5–32, 2001.zbMATHCrossRefGoogle Scholar
  14. [61]
    L. Breiman. Bagging predictors. Machine Learning, 24(2), pp. 123–140, 1996.MathSciNetzbMATHGoogle Scholar
  15. [62]
    L. Breiman and A. Cutler. Random Forests Manual v4.0, Technical Report, UC Berkeley, 2003.
  16. [65]
    P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002.MathSciNetzbMATHCrossRefGoogle Scholar
  17. [80]
    S. Chakrabarti, B. Dom. R. Agrawal, and P. Raghavan. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. The VLDB Journal, 7(3), pp. 163–178, 1998.CrossRefGoogle Scholar
  18. [82]
    S. Chakrabarti, S. Roy, and M. Soundalgekar. Fast and accurate text classification via multiple linear discriminant projections. The VLDB Journal, 12(2), pp. 170–185, 2003.CrossRefGoogle Scholar
  19. [90]
    O. Chapelle, B. Schölkopf, and A. Zien. Semi-supervised learning. MIT Press, 2010.Google Scholar
  20. [92]
    D. Chickering, D. Heckerman, and C. Meek. A Bayesian approach to learning Bayesian networks with local structure. Uncertainty in Artificial Intelligence, pp. 80–89, 1997.Google Scholar
  21. [102]
    W. Cohen. Fast effective rule induction. ICML Conference, pp. 115–123, 1995.CrossRefGoogle Scholar
  22. [103]
    W. Cohen. Learning rules that classify e-mail. AAAI Spring Symposium on Machine Learning in Information Access, 1996.Google Scholar
  23. [104]
    W. Cohen. Learning with set-valued features. In National Conference on Artificial Intelligence, 1996.Google Scholar
  24. [106]
    W. Cohen and Y. Singer. Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems, 17(2), pp 141–173, 1999.CrossRefGoogle Scholar
  25. [113]
    W. Cooper. Some inconsistencies and misnomers in probabilistic information retrieval. ACM Transactions on Information Systems, 13(1), pp. 100–111, 1995.CrossRefGoogle Scholar
  26. [116]
    T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), pp. 1–27, 1967.zbMATHCrossRefGoogle Scholar
  27. [140]
    P. Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29(2–3), pp. 103–130, 1997.zbMATHCrossRefGoogle Scholar
  28. [144]
    R. Duda, P. Hart, W. Stork. Pattern Classification, Wiley Interscience, 2000.zbMATHGoogle Scholar
  29. [147]
    S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. ACM CIKM Conference, pp. 148–155, 1998.Google Scholar
  30. [169]
    M. Fernandez-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? The Journal of Machine Learning Research, 15(1), pp. 3133–3181, 2014.MathSciNetzbMATHGoogle Scholar
  31. [178]
    J. Fürnkranz and G. Widmer. Incremental reduced error pruning. ICML Conference, pp. 70–77, 1994.CrossRefGoogle Scholar
  32. [202]
    E.-H. Han, G. Karypis, and V. Kumar. Text categorization using weighted-adjusted k-nearest neighbor classification, PAKDD Conference, 2001.Google Scholar
  33. [203]
    E.-H. Han and G. Karypis. Centroid-based document classification: Analysis and experimental results. PKDD Conference, 2000.Google Scholar
  34. [207]
    T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6), pp. 607–616, 1996.CrossRefGoogle Scholar
  35. [240]
    T. Joachims. Text categorization with support vector machines: learning with many relevant features. ECML Conference, 1998.Google Scholar
  36. [243]
    T. Joachims. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. ICML Conference, 1997.Google Scholar
  37. [246]
    D. Johnson, F. Oles, T. Zhang, T. Goetz. A decision tree-based symbolic rule induction system for text categorization, IBM Systems Journal, 41(3), pp. 428–437, 2002.CrossRefGoogle Scholar
  38. [258]
    G. Karypis and E.-H. Han. Fast supervised dimensionality reduction with applications to document categorization and retrieval, ACM CIKM Conference, pp. 12–19, 2000.Google Scholar
  39. [267]
    M. Kuhn. Building predictive models in R Using the caret Package. Journal of Statistical Software, 28(5), pp. 1–26, 2008. CrossRefGoogle Scholar
  40. [271]
    W. Lam and C. Y. Ho. Using a generalized instance set for automatic text categorization. ACM SIGIR Conference, 1998.Google Scholar
  41. [285]
    D. Lewis. An evaluation of phrasal and clustered representations for the text categorization task. ACM SIGIR Conference, pp. 37–50, 1992.Google Scholar
  42. [286]
    D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. ECML Conference, pp. 4–15, 1998.CrossRefGoogle Scholar
  43. [287]
    D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93, 1994.Google Scholar
  44. [289]
    H. Li, and K. Yamanishi. Document classification using a finite mixture model. ACL Conference, pp. 39–47, 1997.Google Scholar
  45. [290]
    Y. Li and A. Jain. Classification of text documents. The Computer Journal, 41(8), pp. 537–546, 1998.zbMATHCrossRefGoogle Scholar
  46. [306]
    B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. ACM KDD Conference, pp. 80–86, 1998.Google Scholar
  47. [325]
    A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering., 1996.
  48. [327]
    A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. AAAI Workshop on Learning for Text Categorization, 1998.Google Scholar
  49. [350]
    T. M. Mitchell. The role of unlabeled data in supervised learning. International Colloquium on Cognitive Science, pp. 2–11, 1999.Google Scholar
  50. [364]
    K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification with labeled and unlabeled data using EM. Machine Learning, 39(2), pp. 103–134, 2000.zbMATHCrossRefGoogle Scholar
  51. [378]
    M. Pazzani and D. Kibler. The utility of knowledge in inductive learning. Machine Learning, 9(1), pp. 57–94, 1992.Google Scholar
  52. [395]
    J. Quinlan. C4.5: programs for machine learning. Morgan-Kaufmann Publishers, 1993.Google Scholar
  53. [396]
    J. Quinlan. Induction of decision trees. Machine Learning, 1, pp. 81–106, 1986.Google Scholar
  54. [413]
    J. Rodríguez, L. Kuncheva, and C. Alonso. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), pp. 1619–1630, 2006.CrossRefGoogle Scholar
  55. [414]
    J. Rocchio. Relevance feedback information retrieval. The Smart Retrieval System- Experiments in Automatic Document Processing, G. Salton, Ed. Prentice Hall, Englewood Cliffs, NJ, pp. 313–323, 1971.Google Scholar
  56. [428]
    R. Samworth. Optimal weighted nearest neighbour classifiers. The Annals of Statistics, 40(5), pp. 2733–2763, 2012.MathSciNetzbMATHCrossRefGoogle Scholar
  57. [432]
    S. Sathe and C. Aggarwal. Similarity forests. ACM KDD Conference, 2017.Google Scholar
  58. [439]
    F. Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1), 2002.CrossRefGoogle Scholar
  59. [451]
    N. Slonim and N. Tishby. The power of word clusters for text classification. European Colloquium on Information Retrieval Research (ECIR), 2001.Google Scholar
  60. [492]
    S. Weiss, C. Apte, F. Damerau, D. Johnson, F. Oles, T. Goetz, and T. Hampp. Maximizing text-mining performance. IEEE Intelligent Systems, 14(4), pp. 63–69, 1999.CrossRefGoogle Scholar
  61. [516]
    Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1–2), pp. 69–90, 1999.CrossRefGoogle Scholar
  62. [517]
    Y. Yang. A study on thresholding strategies for text categorization. ACM SIGIR Conference, pp. 137–145, 2001.Google Scholar
  63. [519]
    Y. Yang and X. Liu. A re-examination of text categorization methods. ACM SIGIR Conference, pp. 42–49, 1999.Google Scholar
  64. [520]
    Y. Yang and J. O. Pederson. A comparative study on feature selection in text categorization, ACM SIGIR Conference, pp. 412–420, 1995.Google Scholar
  65. [550]
  66. [551]
  67. [553]
  68. [571]
  69. [572]
  70. [605]

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM T. J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations