Pattern Extraction Method for Text Classification

  • Hung Son Nguyen
  • Hui Wang
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 89)


The quality of classification can be increased by using some feature extraction algorithm, i.e. the algorithm that finds new and more relevant features, before application of learning procedure. In this paper, we investigate a novel feature extraction method for textual data. Usually, texts (documents) are represented as collections of words or keywords. We present a method for finding new numerical attributes that improve the quality of classification. New features are based on a set of words (text pattern) and are defined as number of words occurring in both text pattern and the considered document. Our approach is based on Rough set methods and Lattice Machine theory. The experimental results show that the presented methods improve the classification quality on almost all textual data.


Text Classification Textual Data Decision Table Pattern Text Pattern Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    W. W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995.Google Scholar
  2. 2.
    William W. Cohen and Haym Hirsh. Joins that generalize: Text classification using whirl. In Proc. KDD-98, New York,1998.
  3. 3.
    V.M. Fayad, G.Piatetsky Shapiro, P. Smyth, R. Uthurusamy (eds): Advanced in Knowledge Discovery and Data Mining, AAAI/MIT Press 1996.Google Scholar
  4. 4.
    Nguyen H.Son, Skowron A., 1997. Boolean reasoning for feature extraction problems. In: Z.W. Rai and A.Skowron (Eds.): Proceedings of Tenth International Symposium on Foundation of Intelligent Systems, ISMIS’97, Oct. 1997, NC, USA, Foundation of Intelligent Systems LNAI 1325, Springer Verlag, pp. 117–126.Google Scholar
  5. 5.
    H.S. Nguyen and S.H. Nguyen. Pattern extraction from data, Fundamenta Informaticae 34 (1998) 129–144.MathSciNetMATHGoogle Scholar
  6. 6.
    Nguyen H. Son, Nguyen S. Hoa (1999). Rough Sets and Association rule Generation. Fundamenta Informaticae 40, pp. 383–405.MathSciNetMATHGoogle Scholar
  7. 7.
    Nguyi;n S. Hoa, A. Skowron, P. Synak, 1998. Discovery of data pattern with applications to decomposition and classification problems. In L. Polkowski, A. Skowron (eds.): Rough Sets in Knowledge Discovery 2. Physica-Verlag, Heidelberg, pp. 55–97.Google Scholar
  8. 8.
    Nguyen S.Hoa, 1999. Discovery of Generalized Patterns. In Z.W. Rai and A.Skowron (Eds.): Proceedings of 11th International Symposium on Foundation of Intelligent Systems, ISMIS’99, Foundation of Intelligent Systems LNAI 1609, pp. 574–582.Google Scholar
  9. 9.
    Pawlak Z., 1991. Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht.Google Scholar
  10. 10.
    M. Porter. An algorithm for suffix stripping. Program, 14 (3): 130–137, 1980.CrossRefGoogle Scholar
  11. 11.
    Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.Google Scholar
  12. 12.
    Hui Wang, No Düntsch, and David Bell. Data reduction based on hyper relations. In Proceedings of KDD98, New York, pages 349–353, 1998.Google Scholar
  13. 13.
    Hui Wang, Son Nguyen. Text classification using Lattice Machine. In Proceedings of ISMIS’99, Springer-Verlag, Warsaw, pages 349–353, 1999.Google Scholar
  14. 14.
    Jinxi Xu and W.B. Croft. Corpus-based stemming using co-occurrence of word variants. ACM TOIS, 16 (1): 61–81, Jan. 1998.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Hung Son Nguyen
    • 1
  • Hui Wang
    • 2
  1. 1.Institute of MathematicsWarsaw UniversityWarsawPoland
  2. 2.School of Information and SoftwareEngineering University of Ulster at Jordanstown NIreland

Personalised recommendations