Skip to main content

Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter

  • Conference paper

Part of the book series: Advances in Soft Computing ((AINSC,volume 45))

Abstract

An algorithm for filtering information based on the Pearson χ2 test approach has been implemented and tested on feature selection. This test is frequently used in biomedical data analysis and should be used only for nominal (discretized) features. This algorithm has only one parameter, statistical confidence level that two distributions are identical. Empirical comparisons with four other state-of-the-art features selection algorithms (FCBF, CorrSF, ReliefF and ConnSF) are very encouraging.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. Duch, Filter Methods. In: Feature extraction, foundations and applications. Eds: I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh, Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer, pp. 89–118, 2006.

    Google Scholar 

  2. T.M. Cover. The best two independent measurements are not the two best. IEEE Transactions on Systems, Man, and Cybernetics, 4:116–117, 1974.

    MATH  Google Scholar 

  3. J. Biesiada, W. Duch, Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter Solution. Advances in Soft Computing, Computer Recognition Systems (CORES 2005), pp. 95–105, 2005.

    Google Scholar 

  4. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge, UK, 1988.

    Google Scholar 

  5. M.A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, N.Z, 1999.

    Google Scholar 

  6. L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlationbased filter solution. In 12th Int. Conf. on Machine Learning (IGML-03), Washington, D.C., pp. 856–863, Morgan Kaufmann, CA 2003.

    Google Scholar 

  7. M. Dash and H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 151:155–176, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  8. M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 2003.

    Article  MATH  Google Scholar 

  9. W. Duch, T. Winiarski, J. Biesiada, and A. Kachel. Feature ranking, selection and discretization. In Proceedings of Int. Gonf. on Artificial Neural Networks (ICANN), pages 251–254, Istanbul, 2003. Bogazici University Press.

    Google Scholar 

  10. I. Witten and E. Frank. Data minig — practical machine learning tools and techniques with JAVA implementations. Morgan Kaufmann, San Francisco, CA, 2000.

    Google Scholar 

  11. C.J. Mertz and P.M. Murphy. The UCI repository of machine learning databases. Univ. of California, Irvine, 1998. http://www.ics.uci.edu.pl/ mlearn/MLRespository.html.

    Google Scholar 

  12. J.R. Quinlan. C 4.5: Programs for Machine Learning. Morgan Kaufman, CA, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Biesiada, J., Duch, W. (2007). Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter. In: Kurzynski, M., Puchala, E., Wozniak, M., Zolnierek, A. (eds) Computer Recognition Systems 2. Advances in Soft Computing, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75175-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75175-5_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75174-8

  • Online ISBN: 978-3-540-75175-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics