Skip to main content

A Document Classification Algorithm Using the Fuzzy Set Theory and Hierarchical Structure of Document

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3043))

Abstract

In present, Information retrieval systems which are simply expressed with combination between keywords and phrase search according to the direct keyword matching method to get the information which users need. But Web documents retrieval systems serve too many documents because of term ambiguity. Also it often happens that words with several meanings occur in a document, but in a rather different context from that expected by the querying person. So the user should need extra time and effort to get more close documents. To overcome these problems, in this paper we propose an information retrieval system based on the content, which connects documents according to the degree of semantic link which it express fuzzy value by fuzzy function. Also we propose an algorithm which it produce the hierarchical structure using the degree of concepts and contents among documents. As result, we are able to select and to provide user-interested documents.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-ates, R., Ribeiro-Neto, B.: Modern Information Retrieval, pp. 230–255 (1998)

    Google Scholar 

  2. Wallis, P., Tom, J.A.: Relevance judgements for assessing recall. Information Processing and Management 32, 273–286 (1998)

    Article  Google Scholar 

  3. Klir, G.J., Yuan, B.:Fuzzy Sets and Fuzzy Logic Theory and Applications (1998)

    Google Scholar 

  4. Koczy, L.T.:Information retrieval by fuzzy relations and hierarchical co-occurrence (1997)

    Google Scholar 

  5. Baranyi, P., Gedeon, T.D., Koczy, L.T.:Improved fuzzy and neural network algorithms for frequency prediction in document filtering. TR 97-02 (1997)

    Google Scholar 

  6. Koczy, L.T., Gedeon, T.D., Koczy, J.A.: The construction of fuzzy relational maps in information retrieval. IETR 98-01 (1998)

    Google Scholar 

  7. Koczy, L.T., Gedeon, T.: Information retrieval by fuzzy relations and hierarchical cooccurrence, Part I. TR99-01, Dept. of Info. Eng., School of Comp. Sci. & Eng. UNSW (1999)

    Google Scholar 

  8. Eun, Hye-jue: An Algorithm of Documents classification and Query Extension using fuzzy function. Journal of KISS: Software and applications 28(2) (2001)

    Google Scholar 

  9. Blosseville, M., Hebrail, G., Monteil, M., Penot, N.: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together. In: SIGIR (1999)

    Google Scholar 

  10. Jacobs, P.: Using statistical methods to improve knowledge-based news categorization. IEEE Expert (2000)

    Google Scholar 

  11. Hoch, R.: Using Information Retrieval techniques for text classification in document analysis. In: SIGIR (1999)

    Google Scholar 

  12. Guha, S.: A Robust Clustering Algorithm for categorical Attributes. Information Systems 25(5), 345–366 (2000)

    Article  MathSciNet  Google Scholar 

  13. Oard, D.W.: Support for interactive document selection in cross language information retrieval. Information Processing and Management 35 (1999)

    Google Scholar 

  14. Boley, D.: Document Categorization and Query Generation on the World Wide Web using WebACE. Artificial Intellignece Review 13, 365–391 (1999)

    Article  Google Scholar 

  15. Joachims, T.: Text Categorization with vector support machine : learning with many relevant features. Technical report 23, University of Dortsmund, LS VIII (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Han, SW., Eun, HJ., Kim, YS., Kóczy, L.T. (2004). A Document Classification Algorithm Using the Fuzzy Set Theory and Hierarchical Structure of Document. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3043. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24707-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24707-4_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22054-1

  • Online ISBN: 978-3-540-24707-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics