Advertisement

Using Natural Language Processing to Improve Document Categorization with Associative Networks

  • Niels Bloom
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7337)

Abstract

Associative networks are a connectionist language model with the ability to handle large sets of documents. In this research we investigated the use of natural language processing techniques (part-of-speech tagging and parsing) in combination with Associative Networks for document categorization and compare the results to a TF-IDF baseline. By filtering out unwanted observations and preselecting relevant data based on sentence structure, natural language processing can pre-filter information before it enters the associative network, thus improving results.

Keywords

Associative Networks WordNet Stanford Natural Language Parser Natural Language Processing Document Categorization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bechtel, W.: Connectionism and the philosophy of mind: an overview. The Southern Journal of Philosophy 26, 17–41 (1988)CrossRefGoogle Scholar
  2. 2.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  3. 3.
    Klein, D., Manning, C.: Fast Exact Inference with a Factored Model for Natural Language Parsing. Adv. in Neural Information Processing Systems 15, 3–10 (2003)Google Scholar
  4. 4.
    Marcus, G.F.: The Algebraic Mind: Integrating Connectionism and Cognitive Science. MIT Press, Cambridge (2001)Google Scholar
  5. 5.
    Miller, G.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  6. 6.
    Ramos, J.: Using TF-IDF to Determine Word Relevance in Document Queries. In: Proceedings of the First Instructional Conference on Machine Learning, iCML (2003)Google Scholar
  7. 7.
    Schank, R.C.: Dynamic Memory: A Theory of Learning in Computers and People. Cambridge University Press, New York (1982)Google Scholar
  8. 8.
    Schank, R.C., Abelson, R.P.: Scripts, Plans, Goals and Understanding. Erlbaum, Hillsdale, New Jersey (1977)zbMATHGoogle Scholar
  9. 9.
    Sun, J., Chen, Z., Zeng, H., Lu, Y., Shi, C., Ma, W.: Supervised latent semantic indexing for document categorization. In: Proceedings for ICDM, pp. 535–538 (2004)Google Scholar
  10. 10.
    Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of HLT-NAACL, pp. 252–259 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Niels Bloom
    • 1
    • 2
  1. 1.Pagelink InteractivesHengeloThe Netherlands
  2. 2.University of TwenteEnschedeThe Netherlands

Personalised recommendations