Abstract
This work both describes and evaluates a Bayesian feature selection approach for classification problems. Basically, a Bayesian network is generated from a dataset, and then the Markov Blanket of the class variable is used to the feature subset selection task. The proposed methodology is illustrated by means of simulations in three datasets that are benchmarks for data mining methods: Wisconsin Breast Cancer, Mushroom and Congressional Voting Records. Three classifiers were employed to show the efficacy of the proposed method. The average classification rates obtained in the datasets formed by all features are compared to those achieved in the datasets formed by the features that belong to the Markov Blanket. The performed simulations lead to interesting results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of the 13th International Conference on Machine Learning, July 1996, pp. 284–292 (1996)
Reunanen, J.: Overfitting in Making Comparissons Between Variable Selection Methods. Journal of Machine Learning Research 3, 1371–1382 (2003)
Blum, A.L., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, 245–271 (1997)
Fayyad, U.M., Shapiro, G.P., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. In: Fayyad, et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–37. MIT Press, Cambridge (1996)
Bigus, J.P.: Data Mining with Neural Networks, 1st edn. McGraw-Hill, USA (1996)
Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, Dordrecht (1998)
Yang, Y., Pederson, J.: A comparative study on feature selection in text categorization. In: Proc. of the Fourteenth International Conference on Machine Learning (1997)
Cheng, J., Bell, D.A., Liu, W.: Learning belief networks from data: An information theory based approach. In: Proceedings of the sixth ACM International Conference on Information and Knowledge Management (1997)
Quinlan, J.R.: C4.5 Program for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Witten, I.H., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, USA (2000)
Duch, W., Adamczak, R., Grabczewski, K.: A New Methodology of Extraction, Optimization and Application of Crisp and Fuzzy Logical Rules. IEEE Transactions on Neural Networks 11(2), 1–31 (2000)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988)
Chickering, D.M.: Optimal Structure Identification with Greedy Search. Journal of Machine Learning Research (3), 507–554 (2002)
Hruschka Jr, E.R., Ebecken, N.F.F.: Ordering attributes for missing values prediction and data classification. In: Data Mining III. Management Information Systems Series, vol. 6, WIT Press, Southampton (2002)
Spirtes, P., Glymour, C., Scheines, R.: Causation, Predication, and Search. Springer, New York (1993)
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases Irvine, CA, University of California, Department of Information and Computer Science, http://www.ics.uci.edu
Cooper, G., Herskovitz, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning 9, 309–347 (1992)
Schllimmer, J.C.: Concept acquisition through representational adjustment, Doctoral Dissertation, Department of Information and Computer Science, University of California, Irvine (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hruschka, E.R., Hruschka, E.R., Ebecken, N.F.F. (2004). Feature Selection by Bayesian Networks. In: Tawfik, A.Y., Goodwin, S.D. (eds) Advances in Artificial Intelligence. Canadian AI 2004. Lecture Notes in Computer Science(), vol 3060. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24840-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-24840-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22004-6
Online ISBN: 978-3-540-24840-8
eBook Packages: Springer Book Archive