Skip to main content

Use of a Weighted Topic Hierarchy for Document Classification

  • Conference paper
  • First Online:
Book cover Text, Speech and Dialogue (TSD 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

Abstract

A statistical method of document classification driven by a hierarchical topic dictionary is proposed. The method uses a dictionary with a simple structure and is insensible to inaccuracies in the dictionary. Two kinds of weights of dictionary entries, namely, relevance and discrimination weights are discussed. The first type of weights is associated with the links between words and topics and between the nodes in the tree, while the weights of the second type depend on user database. A common sense-complaint way of assignment of these weights to the topics is presented. A system for text classification Classifier based on the discussed method is described.

The work partially supported by DEPI-IPN, CONACyT (26424-A), and REDII, Mexico.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J. D., Rowley, F. A.: Building End-user Thesauri from Full Text. In: Kwasnik, B. H., Fidel, R. (eds.): Advances in Classification Research. Proceedings of the 2nd ASIS SIG/CR Classification Research Workshop, Vol. 2. Learned Information, Medford, NJ. (1992) 1–13.

    Google Scholar 

  2. Cohen, W. W.: Learning Trees and Rules with Setvalued Features. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence (1996).

    Google Scholar 

  3. Cohen, W., Singer, Y.: Context-sensitive Learning Methods for Text Categorization. In: SIGIR’96 (1996).

    Google Scholar 

  4. Gelbukh, A.: Using a Semantic Network for Lexical and Syntactic Disambiguation. In: Proceedings of Simposium Internacional de Computación: Nuevas Aplicaciones e Innovaciones Tecnológicas en Computación. Mexico (1997) 352–366.

    Google Scholar 

  5. Guzmán-Arenas, A.: Finding the Main Themes in a Spanish Document. Journal Expert Systems with Applications 14 (1, 2) (1998) 139–148.

    Article  Google Scholar 

  6. Guzmán-Arenas, A.: Hallando los Temas Principales en un Artículo en Español. Soluciones Avanzadas 5 (45) (1997) 58, 5 (49) (1997). 66

    Google Scholar 

  7. Jacob, E. K.: Cognition and Classification: A Crossdisciplinary Approach to a Philosophy of Classification. (Abstract.) In: Maxian, B. (ed.): ASIS’ 94: Proceedings of the 57th ASIS Annual Meeting. Medford, NJ: Learned Information (1994) 82.

    Google Scholar 

  8. Krowetz, B.: Homonymy and Polysemy in Information Retrieval. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (1997) 72–79.

    Google Scholar 

  9. Lewis, D. D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval (1994) 81–93.

    Google Scholar 

  10. Riloff, E., Shepherd, J.: A Corpus Based Approach for Building Semantic Lexicons. In: Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (EMNLP-2) (1997).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gelbukh, A., Sidorov, G., Guzman-Arénas, A. (1999). Use of a Weighted Topic Hierarchy for Document Classification. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-48239-3_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66494-9

  • Online ISBN: 978-3-540-48239-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics