Pattern Extraction Method for Text Classification
The quality of classification can be increased by using some feature extraction algorithm, i.e. the algorithm that finds new and more relevant features, before application of learning procedure. In this paper, we investigate a novel feature extraction method for textual data. Usually, texts (documents) are represented as collections of words or keywords. We present a method for finding new numerical attributes that improve the quality of classification. New features are based on a set of words (text pattern) and are defined as number of words occurring in both text pattern and the considered document. Our approach is based on Rough set methods and Lattice Machine theory. The experimental results show that the presented methods improve the classification quality on almost all textual data.
KeywordsText Classification Textual Data Decision Table Pattern Text Pattern Extraction
Unable to display preview. Download preview PDF.
- 1.W. W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995.Google Scholar
- 2.William W. Cohen and Haym Hirsh. Joins that generalize: Text classification using whirl. In Proc. KDD-98, New York,1998.http://www.research.att.com/~wcohen/
- 3.V.M. Fayad, G.Piatetsky Shapiro, P. Smyth, R. Uthurusamy (eds): Advanced in Knowledge Discovery and Data Mining, AAAI/MIT Press 1996.Google Scholar
- 4.Nguyen H.Son, Skowron A., 1997. Boolean reasoning for feature extraction problems. In: Z.W. Rai and A.Skowron (Eds.): Proceedings of Tenth International Symposium on Foundation of Intelligent Systems, ISMIS’97, Oct. 1997, NC, USA, Foundation of Intelligent Systems LNAI 1325, Springer Verlag, pp. 117–126.Google Scholar
- 7.Nguyi;n S. Hoa, A. Skowron, P. Synak, 1998. Discovery of data pattern with applications to decomposition and classification problems. In L. Polkowski, A. Skowron (eds.): Rough Sets in Knowledge Discovery 2. Physica-Verlag, Heidelberg, pp. 55–97.Google Scholar
- 8.Nguyen S.Hoa, 1999. Discovery of Generalized Patterns. In Z.W. Rai and A.Skowron (Eds.): Proceedings of 11th International Symposium on Foundation of Intelligent Systems, ISMIS’99, Foundation of Intelligent Systems LNAI 1609, pp. 574–582.Google Scholar
- 9.Pawlak Z., 1991. Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht.Google Scholar
- 11.Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.Google Scholar
- 12.Hui Wang, No Düntsch, and David Bell. Data reduction based on hyper relations. In Proceedings of KDD98, New York, pages 349–353, 1998.Google Scholar
- 13.Hui Wang, Son Nguyen. Text classification using Lattice Machine. In Proceedings of ISMIS’99, Springer-Verlag, Warsaw, pages 349–353, 1999.Google Scholar