Journal of Computer Science and Technology

, Volume 18, Issue 5, pp 640–647 | Cite as

A fuzzy approach to classification of text documents

  • Liu WeiYi 
  • Song Ning 


This paper discusses the classification problems of text documents. Based on the concept of the proximity degree, the set of words, is partitioned into some equivalence classes. Particularty, the concepts of the semantic field and association degree are given in this paper. Based on the above concepts, this paper presents a fuzzy classification approach for document categorization. Furthermore, applying the concept of the entropy of information, the approaches to select key words from the set of words covering the classification of documents and to construct the hierarchical structure of key words are obtained.


text document classification fuzzy approach semantic association 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Faloutsos C, Oard D. A survey of information retrieval and filtering methods. Technical Report CS-TR-3541, University of Maryland, 1995.Google Scholar
  2. [2]
    Fuhi N, Buckley C. A probabilistic learning approach for document indexing.ACM Trans. Information Systems, 1991, 9(1): 223–248.Google Scholar
  3. [3]
    Lang K. News weeder: Learning to filter netnews. InProc. 12th International Conference on Machine Learning, New York, 1995, pp.331–339.Google Scholar
  4. [4]
    Li Y H, Jain A K. Classification of text documents.The Computer Journal, 1988, 41(8): 537–546.CrossRefGoogle Scholar
  5. [5]
    Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers.In AAAI Spring Symp. Machine Learning in Information Access Technical Papers, Palo, Alto, 1992.Google Scholar
  6. [6]
    Ristad E. A natural law of succession. Technical Report CS-TR-495-95, Princeton University, 1995.Google Scholar
  7. [7]
    Sahami M. Learning limited dependence Bayesian classifiers. InProc. 2nd Int. Conf. Knowledge Discovery and Data Mining, Montreal, Canada, 1996, pp.335–338.Google Scholar
  8. [8]
    Quinlan J. Induction of decision trees.Machine Learning, 1986, 1(1): 81–106.Google Scholar
  9. [9]
    Lalmas M. A model for representing and retrieving heterogeneous structured documents based on evidential reasoning.The Computer Journal, 1999, 42(7): 547–568.MATHCrossRefGoogle Scholar
  10. [10]
    Rijsbergen C J V. A non-classical logic for information retrieval.The Computer Journal, 1986, 29(3): 481–485.MATHCrossRefGoogle Scholar
  11. [11]
    Kolda T G, O'Leary D P. A semidiscrete matrix decomposition for latent semantic indexing in information retrieval.ACM Trans. Information Systems, 1991, 9(2): 223–248.Google Scholar
  12. [12]
    Nie J Y. Towards a probabilistic model logic for semantic-based information retrieval. InProc. the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992. pp.140–151.Google Scholar
  13. [13]
    Wong S K M, Yao Y Y. On modeling information retrieval with probabilistic inference.ACM Trans. Information Systems, 1995, 13(1): 38–68.CrossRefGoogle Scholar
  14. [14]
    Chiaramella Y, Mulhen P, Fourel F. A model for multimedia information retrieval. Technical Report, Fermi ESPRIT BRA 8134, University of Glasgow.Google Scholar
  15. [15]
    Wang W, Rada R. Structured hypertext with domain semantics.ACM Trans. Information Systems, 1998, 16(4): 372–412.CrossRefGoogle Scholar
  16. [16]
    Larky S, Croft W. Combining classifiers in text classification. InProc. SIGIR, Dublin, Ireland, 1996, pp.81–93.Google Scholar
  17. [17]
    Woods K, Kegeimeyer W, Bowyer J K. Combination of multiple classifiers using local accuracy estimates.IEEE Trans. PAMI, 1997, 19(3): 405–410.Google Scholar
  18. [18]
    Lao S Y, Wang H Q, Liu W Y. Functional dependencies with null values, fuzzy values and crisp values.IEEE Trans. Fuzzy Systems, 1999, 7(1): 97–103.CrossRefGoogle Scholar
  19. [19]
    Liu W Y, Song N. The fuzzy association degree in semantic data models.Fuzzy Sets and Systems, 2001, 117(2): 203–208.MATHCrossRefMathSciNetGoogle Scholar
  20. [20]
    Liu W Y. A relational data model with fuzzy inheritance dependencies.Fuzzy Sets and Systems, 1997, 89(2): 205–213.CrossRefGoogle Scholar
  21. [21]
    Liu W Y. An effective partition method of the fuzzy inheritance hierarchies on the basis of the semantic proximity.International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998, 6(5): 505–513.CrossRefGoogle Scholar
  22. [22]
    Cohen W W. Learning to classify English text with ILP methods. InProc. 5th Int. Workshop on Inductive Logic Programming, 1995, pp.3–24.Google Scholar
  23. [23]
    Jarjan R E, Leeuwen J V. Worst-case analysis of set union algorithms.J. ACM, 1984, 31(2): 245–281.CrossRefGoogle Scholar
  24. [24]
    Larsen H L, Yager R R. Efficient computing of transitive closures.Fuzzy Sets and Systems, 1990, 38(1): 81–90.MATHCrossRefMathSciNetGoogle Scholar
  25. [25]
    Klir G. Fuzzy Sets: An Overview of Fundamentals, Applications, and Personal Views. Beijing Normal University Press, Beijing, 2000.MATHGoogle Scholar
  26. [26]
    Chen Y, Wang Z W, He Q C. A fuzzy clustering method and its effectivity based on the fuzzy proximity relation.Journal of Sichuan University, 1997, 34(5): 41–46.Google Scholar
  27. [27]
    Robert A. Information Theory. Interscience Publishers, New York, 1965.Google Scholar

Copyright information

© Science Press, Beijing China and Allerton Press Inc. 2003

Authors and Affiliations

  • Liu WeiYi 
    • 1
    • 2
  • Song Ning 
    • 3
  1. 1.Department of Computer ScienceYunnan UniversityKunningP.R. China
  2. 2.The Key Laboratory of Intelligent Information Processing, Institute of Computer TechnologyThe Chinese Academy of SciencesBeijingP.R. China
  3. 3.Department of MetallurgyKunming University of Science and TechnologyKunmingP.R. China

Personalised recommendations