Skip to main content
Log in

A fuzzy approach to classification of text documents

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

This paper discusses the classification problems of text documents. Based on the concept of the proximity degree, the set of words, is partitioned into some equivalence classes. Particularty, the concepts of the semantic field and association degree are given in this paper. Based on the above concepts, this paper presents a fuzzy classification approach for document categorization. Furthermore, applying the concept of the entropy of information, the approaches to select key words from the set of words covering the classification of documents and to construct the hierarchical structure of key words are obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Faloutsos C, Oard D. A survey of information retrieval and filtering methods. Technical Report CS-TR-3541, University of Maryland, 1995.

  2. Fuhi N, Buckley C. A probabilistic learning approach for document indexing.ACM Trans. Information Systems, 1991, 9(1): 223–248.

    Google Scholar 

  3. Lang K. News weeder: Learning to filter netnews. InProc. 12th International Conference on Machine Learning, New York, 1995, pp.331–339.

  4. Li Y H, Jain A K. Classification of text documents.The Computer Journal, 1988, 41(8): 537–546.

    Article  Google Scholar 

  5. Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers.In AAAI Spring Symp. Machine Learning in Information Access Technical Papers, Palo, Alto, 1992.

  6. Ristad E. A natural law of succession. Technical Report CS-TR-495-95, Princeton University, 1995.

  7. Sahami M. Learning limited dependence Bayesian classifiers. InProc. 2nd Int. Conf. Knowledge Discovery and Data Mining, Montreal, Canada, 1996, pp.335–338.

  8. Quinlan J. Induction of decision trees.Machine Learning, 1986, 1(1): 81–106.

    Google Scholar 

  9. Lalmas M. A model for representing and retrieving heterogeneous structured documents based on evidential reasoning.The Computer Journal, 1999, 42(7): 547–568.

    Article  MATH  Google Scholar 

  10. Rijsbergen C J V. A non-classical logic for information retrieval.The Computer Journal, 1986, 29(3): 481–485.

    Article  MATH  Google Scholar 

  11. Kolda T G, O'Leary D P. A semidiscrete matrix decomposition for latent semantic indexing in information retrieval.ACM Trans. Information Systems, 1991, 9(2): 223–248.

    Google Scholar 

  12. Nie J Y. Towards a probabilistic model logic for semantic-based information retrieval. InProc. the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992. pp.140–151.

  13. Wong S K M, Yao Y Y. On modeling information retrieval with probabilistic inference.ACM Trans. Information Systems, 1995, 13(1): 38–68.

    Article  Google Scholar 

  14. Chiaramella Y, Mulhen P, Fourel F. A model for multimedia information retrieval. Technical Report, Fermi ESPRIT BRA 8134, University of Glasgow.

  15. Wang W, Rada R. Structured hypertext with domain semantics.ACM Trans. Information Systems, 1998, 16(4): 372–412.

    Article  Google Scholar 

  16. Larky S, Croft W. Combining classifiers in text classification. InProc. SIGIR, Dublin, Ireland, 1996, pp.81–93.

  17. Woods K, Kegeimeyer W, Bowyer J K. Combination of multiple classifiers using local accuracy estimates.IEEE Trans. PAMI, 1997, 19(3): 405–410.

    Google Scholar 

  18. Lao S Y, Wang H Q, Liu W Y. Functional dependencies with null values, fuzzy values and crisp values.IEEE Trans. Fuzzy Systems, 1999, 7(1): 97–103.

    Article  Google Scholar 

  19. Liu W Y, Song N. The fuzzy association degree in semantic data models.Fuzzy Sets and Systems, 2001, 117(2): 203–208.

    Article  MATH  MathSciNet  Google Scholar 

  20. Liu W Y. A relational data model with fuzzy inheritance dependencies.Fuzzy Sets and Systems, 1997, 89(2): 205–213.

    Article  Google Scholar 

  21. Liu W Y. An effective partition method of the fuzzy inheritance hierarchies on the basis of the semantic proximity.International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998, 6(5): 505–513.

    Article  Google Scholar 

  22. Cohen W W. Learning to classify English text with ILP methods. InProc. 5th Int. Workshop on Inductive Logic Programming, 1995, pp.3–24.

  23. Jarjan R E, Leeuwen J V. Worst-case analysis of set union algorithms.J. ACM, 1984, 31(2): 245–281.

    Article  Google Scholar 

  24. Larsen H L, Yager R R. Efficient computing of transitive closures.Fuzzy Sets and Systems, 1990, 38(1): 81–90.

    Article  MATH  MathSciNet  Google Scholar 

  25. Klir G. Fuzzy Sets: An Overview of Fundamentals, Applications, and Personal Views. Beijing Normal University Press, Beijing, 2000.

    MATH  Google Scholar 

  26. Chen Y, Wang Z W, He Q C. A fuzzy clustering method and its effectivity based on the fuzzy proximity relation.Journal of Sichuan University, 1997, 34(5): 41–46.

    Google Scholar 

  27. Robert A. Information Theory. Interscience Publishers, New York, 1965.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work is supported by the National Natural Science Foundation of China (Grant No.50263006), the Foundation of the Key Laboratory of intelligent Information Processing, Institute of Computing Technology Chinese Academy of Sciences (Grant No HP2002-2), and the Yunnan Natural Science Foundation (Grant No.2002F0063M).

LIU WeiYi graduated from Huazhong University of Science and Technology in 1976. He was a research fellow at Hong Kong City University. Currently, he is a professor of the Department of Computer Science of Yunnan University. His research interests include fuzzy systems, data and knowledge engineering. He is a member of IEEE Computer Society.

SONG Ning received the M.S. degree from Kunming University of Science and Technology in 1993. She is an associate professor of the Department of Metallurgical Engineering. Her current research interests include data and knowledge engineering.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, W., Song, N. A fuzzy approach to classification of text documents. J. Comput. Sci. & Technol. 18, 640–647 (2003). https://doi.org/10.1007/BF02947124

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02947124

Keywords

Navigation