Skip to main content

Feature Selection by Distributions Contrasting

  • Conference paper
Artificial Intelligence: Methodology, Systems, and Applications (AIMSA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8722))

  • 1331 Accesses

Abstract

We consider the problem of selection the set of features that are the most significant for partitioning two given data sets. The criterion for selection which is to be maximized is the symmetric information distance between distributions of the features subset in the two classes. These distributions are estimated using Bayesian approach for uniform priors, the symmetric information distance is given by the lower estimate for corresponding average risk functional using Rademacher penalty and inequalities from the empirical processes theory. The approach was applied to a real example for selection a set of manufacture process parameters to predict one of two states of the process. It was found that only 2 parameters from 10 were enough to recognize the true state of the process with error level 8%. The set of parameters was found on the base of 550 independent observations in training sample. Performance of the approach was evaluated using 270 independent observations in test sample.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bay, S.D., Pazzani, M.J.: Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery 5, 213–246 (2001)

    Article  MATH  Google Scholar 

  2. Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. AI 97(1-2), 245–271 (1997)

    MATH  MathSciNet  Google Scholar 

  3. Coetzee, F.M.: Correcting Kullback-Leibler Distance for Feature Selection. Pattern Recognition Letters 26(11), 1675–1683 (2005)

    Article  Google Scholar 

  4. Cover, T., Thomas, J.: Elements of Infornation Theory. Wiley (1991)

    Google Scholar 

  5. Kira, K., Rendell, L.: The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on Artificail Intelligence, pp. 129–134. MIT Press (1992)

    Google Scholar 

  6. Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann Publishers (1996)

    Google Scholar 

  7. Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory (1999)

    Google Scholar 

  8. Koltchinskii, V.: Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. LNM, vol. 2033. Springer, Heidelberg (2008)

    Google Scholar 

  9. Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  10. Lozano, F.: Model selection using Rademacher Penalization. In: The Second ICSC Symposia on Neural Computation (NC 2000). ICSC Adademic (2000)

    Google Scholar 

  11. Manning, C., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  12. Song, L., Smola, A., Gretton, A., Bedo, J., Borgwardt, K.: Feature selection via dependence maximization. Journal of Machine Learning Research 13, 1393–1434 (2012)

    MATH  MathSciNet  Google Scholar 

  13. Tsurko, V., Michalski, A.: Statistical Analysis of Links between Cancer and Associated Diseases (in Russian). Adv. Geront. 26(4), 766–774 (2013)

    Google Scholar 

  14. Vapnik, V.: Statitical Learning Theory. Wiley Interscience (1998)

    Google Scholar 

  15. Vapnik, V., Chervonenkis, A.: Pattern Recognition Theory (in Russian). Nauka, Moscow (1974)

    Google Scholar 

  16. Wolf, L., Shashua, A.: Features Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach. Journal of Machine Learning Research 6, 1855–1887 (2005)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tsurko, V.V., Michalski, A.I. (2014). Feature Selection by Distributions Contrasting. In: Agre, G., Hitzler, P., Krisnadhi, A.A., Kuznetsov, S.O. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2014. Lecture Notes in Computer Science(), vol 8722. Springer, Cham. https://doi.org/10.1007/978-3-319-10554-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10554-3_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10553-6

  • Online ISBN: 978-3-319-10554-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics