Abstract
We consider the problem of selection the set of features that are the most significant for partitioning two given data sets. The criterion for selection which is to be maximized is the symmetric information distance between distributions of the features subset in the two classes. These distributions are estimated using Bayesian approach for uniform priors, the symmetric information distance is given by the lower estimate for corresponding average risk functional using Rademacher penalty and inequalities from the empirical processes theory. The approach was applied to a real example for selection a set of manufacture process parameters to predict one of two states of the process. It was found that only 2 parameters from 10 were enough to recognize the true state of the process with error level 8%. The set of parameters was found on the base of 550 independent observations in training sample. Performance of the approach was evaluated using 270 independent observations in test sample.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bay, S.D., Pazzani, M.J.: Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)
Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. AI 97(1-2), 245–271 (1997)
Coetzee, F.M.: Correcting Kullback-Leibler Distance for Feature Selection. Pattern Recognition Letters 26(11), 1675–1683 (2005)
Cover, T., Thomas, J.: Elements of Infornation Theory. Wiley (1991)
Kira, K., Rendell, L.: The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on Artificail Intelligence, pp. 129–134. MIT Press (1992)
Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann Publishers (1996)
Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory (1999)
Koltchinskii, V.: Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. LNM, vol. 2033. Springer, Heidelberg (2008)
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)
Lozano, F.: Model selection using Rademacher Penalization. In: The Second ICSC Symposia on Neural Computation (NC 2000). ICSC Adademic (2000)
Manning, C., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)
Song, L., Smola, A., Gretton, A., Bedo, J., Borgwardt, K.: Feature selection via dependence maximization. Journal of Machine Learning Research 13, 1393–1434 (2012)
Tsurko, V., Michalski, A.: Statistical Analysis of Links between Cancer and Associated Diseases (in Russian). Adv. Geront. 26(4), 766–774 (2013)
Vapnik, V.: Statitical Learning Theory. Wiley Interscience (1998)
Vapnik, V., Chervonenkis, A.: Pattern Recognition Theory (in Russian). Nauka, Moscow (1974)
Wolf, L., Shashua, A.: Features Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach. Journal of Machine Learning Research 6, 1855–1887 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tsurko, V.V., Michalski, A.I. (2014). Feature Selection by Distributions Contrasting. In: Agre, G., Hitzler, P., Krisnadhi, A.A., Kuznetsov, S.O. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2014. Lecture Notes in Computer Science(), vol 8722. Springer, Cham. https://doi.org/10.1007/978-3-319-10554-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-10554-3_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10553-6
Online ISBN: 978-3-319-10554-3
eBook Packages: Computer ScienceComputer Science (R0)