Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter

Biesiada, Jacek; Duch, Wlodzisław

doi:10.1007/978-3-540-75175-5_30

Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter

Jacek Biesiada³ &
Wlodzisław Duch⁴

Conference paper

1044 Accesses
70 Citations

Part of the book series: Advances in Soft Computing ((AINSC,volume 45))

Abstract

An algorithm for filtering information based on the Pearson χ² test approach has been implemented and tested on feature selection. This test is frequently used in biomedical data analysis and should be used only for nominal (discretized) features. This algorithm has only one parameter, statistical confidence level that two distributions are identical. Empirical comparisons with four other state-of-the-art features selection algorithms (FCBF, CorrSF, ReliefF and ConnSF) are very encouraging.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

W. Duch, Filter Methods. In: Feature extraction, foundations and applications. Eds: I. Guyon, S. Gunn, M. Nikravesh, L. Zadeh, Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer, pp. 89–118, 2006.
Google Scholar
T.M. Cover. The best two independent measurements are not the two best. IEEE Transactions on Systems, Man, and Cybernetics, 4:116–117, 1974.
MATH Google Scholar
J. Biesiada, W. Duch, Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter Solution. Advances in Soft Computing, Computer Recognition Systems (CORES 2005), pp. 95–105, 2005.
Google Scholar
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes in C. The art of scientific computing. Cambridge University Press, Cambridge, UK, 1988.
Google Scholar
M.A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, N.Z, 1999.
Google Scholar
L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlationbased filter solution. In 12th Int. Conf. on Machine Learning (IGML-03), Washington, D.C., pp. 856–863, Morgan Kaufmann, CA 2003.
Google Scholar
M. Dash and H. Liu. Consistency-based search in feature selection. Artificial Intelligence, 151:155–176, 2003.
Article MATH MathSciNet Google Scholar
M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Machine Learning, 53:23–69, 2003.
Article MATH Google Scholar
W. Duch, T. Winiarski, J. Biesiada, and A. Kachel. Feature ranking, selection and discretization. In Proceedings of Int. Gonf. on Artificial Neural Networks (ICANN), pages 251–254, Istanbul, 2003. Bogazici University Press.
Google Scholar
I. Witten and E. Frank. Data minig — practical machine learning tools and techniques with JAVA implementations. Morgan Kaufmann, San Francisco, CA, 2000.
Google Scholar
C.J. Mertz and P.M. Murphy. The UCI repository of machine learning databases. Univ. of California, Irvine, 1998. http://www.ics.uci.edu.pl/ mlearn/MLRespository.html.
Google Scholar
J.R. Quinlan. C 4.5: Programs for Machine Learning. Morgan Kaufman, CA, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Computer Methods, Dept. of Electrotechnology, The Silesian University of Technology, Katowice, Poland
Jacek Biesiada
Dept. of Informatics, Nicolaus Copernicus University, Toruń, Poland
Wlodzisław Duch

Authors

Jacek Biesiada
View author publications
You can also search for this author in PubMed Google Scholar
Wlodzisław Duch
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electronics, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Marek Kurzynski , Edward Puchala , Michal Wozniak & Andrzej Zolnierek , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Biesiada, J., Duch, W. (2007). Feature Selection for High-Dimensional Data — A Pearson Redundancy Based Filter. In: Kurzynski, M., Puchala, E., Wozniak, M., Zolnierek, A. (eds) Computer Recognition Systems 2. Advances in Soft Computing, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75175-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-75175-5_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75174-8
Online ISBN: 978-3-540-75175-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics