Abstract
This paper presents a study where feature selection algorithms were evaluated in order to improve documents’ classification performance. The study was made during the project DEEPSIA, IST project Nr. 1999-20 283, funded by the European Union. The need to improve documents recognition was imposed by the need to increase the overall performance of the Framework for Internet data collection based on intelligent agents, used within the project. The Framework is briefly described and the learning techniques used are presented. The focus of this paper is on the feature selection algorithms, where the most relevant work was the use of Conditional Mutual Information, estimated using genetic algorithms, since the computational complexity of \( C_K^N \) invalidated an iterative approach. Methods, techniques and comparative results are presented in detail.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bokma, A.: CogNet: Integrated Information and knowledge management and its use in virtual organizations. In: E-business and Virtual Enterprises Managing Business-to-Business cooperation, Luis Camarinha-Matos, Hamideh Afsarmanesh, Ricardo Rabelo, Kluwer Academic publishers, ISBN 0-7923-7205-0 (2000)
Sousa, P., Pimentão, J., Garção, A.: Deepsia — From supply chains to supply webs. In Intelligent Engineering Systems through artificial neural networks, Cihan H. Dagli, Anna L. Buczak, Joydeep Ghosh, Mark J. Embrechts, Okan Ersoy, Stephen Hercel, Volume 11, ASME PRESS, NEW YORK, ISBN 0-7918-0176-4, (2001) 1019–1024
Sousa, P., Pimentão, J., Garção, A.: DEEPSIA — focusing e-commerce on the purchaser’s side. In International ICSC Congress on Computational Intelligence: Methods and Applications (CIMA’2001), Ludmila l. Kuncheva, Friedrich Steimann, Christian Haefke, Mayer Aladjem, Vilem Novak, ICSC Academic Press, Canada, ISBN 3-906454-26-6 (2001) 436–442
Yang, Y., Liu, X.: A re-examination of text categorization methods”. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99) (1999) 42–49.
Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA: (1993)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistic learning, Data Mining, Inference, and prediction, Springer Series in Statistic, Springer-Verlag, ISBN 0387952845, (2001)
Sousa, P., Pimentão, J., Pires, M., Garção, A. A Framework for Internet Data Collection based on Intelligent Agents: The Methodology to produce equivalent DSS, Intelligent Engineering Systems through artificial neural networks, Volume 12, Dagli, C., at al (ed), ASME Press, New York, ISBN 0-7918-0191-8, (2002) 147–151
Junker, Markus, Hoch Rainer, Dengel Andreas, On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy in Fifth International Conference on Document Analysis and Recognition, page 713, 1998
Rijsbergen, C. J. v.: Information Retrieval. London, Butterworths (1979)
Warren Sarle: What are cross-validation and bootstrapping? http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sousa, P.A.C., Pimentão, J.P., Santos, B.R.D., Moura-Pires, F. (2003). Feature Selection Algorithms to Improve Documents’ Classification Performance. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds) Advances in Web Intelligence. AWIC 2003. Lecture Notes in Computer Science, vol 2663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44831-4_30
Download citation
DOI: https://doi.org/10.1007/3-540-44831-4_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40124-7
Online ISBN: 978-3-540-44831-0
eBook Packages: Springer Book Archive