Leveraging Higher Order Dependencies between Features for Text Classification

  • Murat C. Ganiz
  • Nikita I. Lytkin
  • William M. Pottenger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)


Traditional machine learning methods only consider relationships between feature values within individual data instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of these higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.


machine learning text classification higher order learning statistical relational learning higher order naive bayes higher order support vector machine 


  1. 1.
    Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. SIGMOD Rec. 27(2), 307–318 (1998)CrossRefGoogle Scholar
  2. 2.
    Neville, J., Jensen, D.: Iterative classification in relational data. In: Proc. AAAI, pp. 13–20. AAAI Press, Menlo Park (2000)Google Scholar
  3. 3.
    Taskar, B., Segal, E., Koller, D.: Probabilistic classification and clustering in relational data. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 870–878 (2001)Google Scholar
  4. 4.
    Ganiz, M.C., Kanitkar, S., Chuah, M.C., Pottenger, W.M.: Detection of interdomain routing anomalies based on higher-order path analysis. In: ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, pp. 874–879. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  5. 5.
    Kontostathis, A., Pottenger, W.M.: A framework for understanding latent semantic indexing (LSI) performance. Inf. Process. Manage. 42(1), 56–73 (2006)CrossRefGoogle Scholar
  6. 6.
    Slonim, N., Tishby, N.: The power of word clusters for text classification. In: 23rd European Colloquium on Information Retrieval Research (2001)Google Scholar
  7. 7.
    Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newsl. 7(2), 3–12 (2005)CrossRefGoogle Scholar
  8. 8.
    Lu, Q., Getoor, L.: Link-based classification. In: Fawcett, T., Mishra, N. (eds.) ICML, pp. 496–503. AAAI Press, Menlo Park (2003)Google Scholar
  9. 9.
    Neville, J., Jensen, D.: Dependency networks for relational data. In: Fourth IEEE International Conference on Data Mining, 2004. ICDM 2004, pp. 170–177 (2004)Google Scholar
  10. 10.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  11. 11.
    Li, S., Wu, T., Pottenger, W.M.: Distributed higher order association rule mining using information extracted from textual data. SIGKDD Explor. Newsl. 7(1), 26–35 (2005)CrossRefGoogle Scholar
  12. 12.
    Edmonds, P.: Choosing the word most typical in context using a lexical co-occurrence network. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 507–509 (1997)Google Scholar
  13. 13.
    Zhang, X., Berry, M.W., Raghavan, P.: Level search schemes for information filtering and retrieval. Inf. Process. Manage. 37(2), 313–334 (2001)CrossRefzbMATHGoogle Scholar
  14. 14.
    Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)MathSciNetGoogle Scholar
  15. 15.
    Xu, J., Croft, W.B.: Corpus-based stemming using cooccurrence of word variants. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)CrossRefGoogle Scholar
  16. 16.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  17. 17.
    Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 331–339 (1995)Google Scholar
  18. 18.
    Sen, P., Getoor, L.: Link-based classification. Technical Report CS-TR-4858, University of Maryland (February 2007)Google Scholar
  19. 19.
    Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)zbMATHGoogle Scholar
  20. 20.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Norwell (2002)CrossRefGoogle Scholar
  21. 21.
    Kreßel, U.H.G.: Pairwise classification and support vector machines. In: Advances in kernel methods: support vector learning, pp. 255–268. MIT Press, Cambridge (1999)Google Scholar
  22. 22.
    Ganiz, M.C., Lytkin, N.I., Pottenger, W.M.: Leveraging higher order dependencies between features for text classification. Technical Report 2009-16, DIMACS, Rutgers University (June 2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Murat C. Ganiz
    • 1
    • 3
  • Nikita I. Lytkin
    • 2
  • William M. Pottenger
    • 2
    • 3
  1. 1.Department of Computer ScienceLehigh UniversityUSA
  2. 2.Department of Computer Science RutgersThe State University of New JerseyUSA
  3. 3.DIMACS RutgersThe State University of New JerseyUSA

Personalised recommendations