Fast-Ensembles of Minimum Redundancy Feature Selection

Schowe, Benjamin; Morik, Katharina

doi:10.1007/978-3-642-22910-7_5

Benjamin Schowe⁵ &
Katharina Morik⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 373))

1579 Accesses
12 Citations

Abstract

Finding relevant subspaces in very high-dimensional data is a challenging task not only for microarray data. The selection of features is to enhance the classification performance, but on the other hand the feature selection must be stable, i.e., the set of features selected should not change when using different subsets of a population. ensemble methods have succeeded in the increase of stability and classification accuracy. However, their runtime prevents them from scaling up to real-world applications.We propose two methods which enhance correlation-based feature selection such that the stability of feature selection comes with little or even no extra runtime.We show the efficiency of the algorithms analytically and empirically on a wide range of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bontempi, G., Meyer, P.E.: Causal filter selection in microarray data. In: Fürnkranz, J., Joachims, T. (eds.) Proc. the 27th Int. Conf. Machine Learning, Haifa, Israel, pp. 95–102. Omnipress, Madison (2010)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Ding, C.H.Q., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proc. the 2nd IEEE Comp. Society Bioinformatics Conf., Stanford, CA, pp. 523–529. IEEE Comp. Society, Los Alamitos (2003)
Google Scholar
Fox, R.J., Dimmic, M.W.: A two-sample Bayesian t-test for microarray data. BMC Bioinformatics 7 (2006)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)
Article MathSciNet MATH Google Scholar
Gulgezen, G., Cataltepe, Z., Yu, L.: Stable and accurate feature selection. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5781, pp. 455–468. Springer, Heidelberg (2009)
Chapter Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Langley, P. (ed.) Proc. the 17th Int. Conf. Machine Learning, Stanford, CA, pp. 359–366. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Han, Y., Yu, L.: A variance reduction framework for stable feature selection. In: Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) Proc. the 10th IEEE Int. Conf. Data Mining, Sydney, Australia, pp. 206–215. IEEE Computer Society, Los Alamitos (2010)
Chapter Google Scholar
Jurman, G., Merler, S., Barla, A., Paoli, S., Galea, A., Furlanello, C.: Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics 24, 258–264 (2008)
Article Google Scholar
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge and Inf. Syst. 12, 95–116 (2007)
Article Google Scholar
Koh, J.L.Y., Li Lee, M., Hsu, W., Lam, K.-T.: Correlation-based detection of attribute outliers. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 164–175. Springer, Heidelberg (2007)
Chapter Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Article MATH Google Scholar
Kuncheva, L.I.: A stability index for feature selection. In: Devedzic, V. (ed.) IASTED Int. Conf. Artif. Intell. and Appl., Innsbruck, Austria, pp. 421–427. ACTA Press, Calgary (2007)
Google Scholar
Michalak, K., Kwasnicka, H.: Correlation-based feature selection strategy in neural classification. In: Proc. the 6th Int. Conf. Intell. Syst. Design and Appl., Jinan, China, pp. 741–746. IEEE Comp. Society, Los Alamitos (2006)
Google Scholar
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)
Chapter Google Scholar
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. the National Academy of Sciences of the United States of America 98, 5116–5121 (2001)
Google Scholar
Vapnik, V.: Statistical learning theory. Wiley, Chichester (1998)
MATH Google Scholar
Xu, X., Zhang, A.: Boost feature subset selection: A new gene selection algorithm for microarray dataset. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 670–677. Springer, Heidelberg (2006)
Chapter Google Scholar
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Machine Learning Research 5, 1205–1224 (2004)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Dortmund, Germany
Benjamin Schowe & Katharina Morik

Authors

Benjamin Schowe
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Morik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Malmo, Stora Trädgårdsgatan 20, läg 1601, 21128, Malmö, Sweden
Oleg Okun
Department of Computer Science, University of Milan, Via Comelico 39, 20135, Milano, Italy
Giorgio Valentini
Department of Computer Science, University of Milan, via Comelico 39/41, 20135, Milano, Italia
Matteo Re

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schowe, B., Morik, K. (2011). Fast-Ensembles of Minimum Redundancy Feature Selection. In: Okun, O., Valentini, G., Re, M. (eds) Ensembles in Machine Learning Applications. Studies in Computational Intelligence, vol 373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22910-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-22910-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22909-1
Online ISBN: 978-3-642-22910-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics