Leveraging Higher Order Dependencies between Features for Text Classification

Ganiz, Murat C.; Lytkin, Nikita I.; Pottenger, William M.

doi:10.1007/978-3-642-04180-8_42

Murat C. Ganiz^22,24,
Nikita I. Lytkin²³ &
William M. Pottenger^23,24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2579 Accesses
16 Citations

Abstract

Traditional machine learning methods only consider relationships between feature values within individual data instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of these higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.

Download to read the full chapter text

Chapter PDF

An automatic classification of text documents based on correlative association of words

Article 14 August 2017

A discriminative model selection approach and its application to text classification

Article 15 July 2017

Learning to Classify Text Using a Few Labeled Examples

Keywords

References

Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. SIGMOD Rec. 27(2), 307–318 (1998)
Article Google Scholar
Neville, J., Jensen, D.: Iterative classification in relational data. In: Proc. AAAI, pp. 13–20. AAAI Press, Menlo Park (2000)
Google Scholar
Taskar, B., Segal, E., Koller, D.: Probabilistic classification and clustering in relational data. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 870–878 (2001)
Google Scholar
Ganiz, M.C., Kanitkar, S., Chuah, M.C., Pottenger, W.M.: Detection of interdomain routing anomalies based on higher-order path analysis. In: ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, pp. 874–879. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Kontostathis, A., Pottenger, W.M.: A framework for understanding latent semantic indexing (LSI) performance. Inf. Process. Manage. 42(1), 56–73 (2006)
Article Google Scholar
Slonim, N., Tishby, N.: The power of word clusters for text classification. In: 23rd European Colloquium on Information Retrieval Research (2001)
Google Scholar
Getoor, L., Diehl, C.P.: Link mining: a survey. SIGKDD Explor. Newsl. 7(2), 3–12 (2005)
Article Google Scholar
Lu, Q., Getoor, L.: Link-based classification. In: Fawcett, T., Mishra, N. (eds.) ICML, pp. 496–503. AAAI Press, Menlo Park (2003)
Google Scholar
Neville, J., Jensen, D.: Dependency networks for relational data. In: Fourth IEEE International Conference on Data Mining, 2004. ICDM 2004, pp. 170–177 (2004)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Article Google Scholar
Li, S., Wu, T., Pottenger, W.M.: Distributed higher order association rule mining using information extracted from textual data. SIGKDD Explor. Newsl. 7(1), 26–35 (2005)
Article Google Scholar
Edmonds, P.: Choosing the word most typical in context using a lexical co-occurrence network. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 507–509 (1997)
Google Scholar
Zhang, X., Berry, M.W., Raghavan, P.: Level search schemes for information filtering and retrieval. Inf. Process. Manage. 37(2), 313–334 (2001)
Article MATH Google Scholar
Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)
MathSciNet Google Scholar
Xu, J., Croft, W.B.: Corpus-based stemming using cooccurrence of word variants. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)
Article Google Scholar
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Google Scholar
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 331–339 (1995)
Google Scholar
Sen, P., Getoor, L.: Link-based classification. Technical Report CS-TR-4858, University of Maryland (February 2007)
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)
MATH Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Norwell (2002)
Book Google Scholar
Kreßel, U.H.G.: Pairwise classification and support vector machines. In: Advances in kernel methods: support vector learning, pp. 255–268. MIT Press, Cambridge (1999)
Google Scholar
Ganiz, M.C., Lytkin, N.I., Pottenger, W.M.: Leveraging higher order dependencies between features for text classification. Technical Report 2009-16, DIMACS, Rutgers University (June 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Lehigh University, USA
Murat C. Ganiz
Department of Computer Science Rutgers, The State University of New Jersey, USA
Nikita I. Lytkin & William M. Pottenger
DIMACS Rutgers, The State University of New Jersey, USA
Murat C. Ganiz & William M. Pottenger

Authors

Murat C. Ganiz
View author publications
You can also search for this author in PubMed Google Scholar
Nikita I. Lytkin
View author publications
You can also search for this author in PubMed Google Scholar
William M. Pottenger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ganiz, M.C., Lytkin, N.I., Pottenger, W.M. (2009). Leveraging Higher Order Dependencies between Features for Text Classification. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Leveraging Higher Order Dependencies between Features for Text Classification

Abstract

Chapter PDF

Similar content being viewed by others

An automatic classification of text documents based on correlative association of words

A discriminative model selection approach and its application to text classification

Learning to Classify Text Using a Few Labeled Examples

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Leveraging Higher Order Dependencies between Features for Text Classification

Abstract

Chapter PDF

Similar content being viewed by others

An automatic classification of text documents based on correlative association of words

A discriminative model selection approach and its application to text classification

Learning to Classify Text Using a Few Labeled Examples

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation