Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

Susnjak, Teo; Kerry, David; Barczak, Andre; Reyes, Napoleon; Gal, Yaniv

doi:10.1007/978-3-319-26350-2_47

Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

Teo Susnjak¹⁵,
David Kerry¹⁵,
Andre Barczak¹⁵,
Napoleon Reyes¹⁵ &
…
Yaniv Gal¹⁶

Conference paper
First Online: 22 November 2015

1592 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9457))

Abstract

The accuracy of feature selection methods is affected by both the nature of the underlying datasets and the actual machine learning algorithms they are combined with. The role these factors have in the final accuracy of the classifiers is generally unknown in advance. This paper presents an ensemble-based feature selection approach that addresses this uncertainty and mitigates against the variability in the generalisation of the classifiers. The study conducts extensive experiments with combinations of three feature selection methods on nine datasets, which are trained on eight different types of machine learning algorithms. The results confirm that the ensemble based approaches to feature selection tend to produce classifiers with higher accuracies, are more reliable due to decreased variances and are thus more generalisable.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
These datasets were provided by Compac Sorting Ltd., a company that specialises in automated fruit sorting via image processing.

References

Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)
Article Google Scholar
Albrecht, A.A.: Stochastic local search for the feature set problem, with applications to microarray data. Appl. Math. Comput. 183(2), 1148–1164 (2006)
Article MathSciNet MATH Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Bermejo, P., de la Ossa, L., Gámez, J.A., Puerta, J.M.: Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking. Knowl.-Based Syst. 25(1), 35–44 (2012)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Article Google Scholar
Cohen, W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet MATH Google Scholar
Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43(1), 5–13 (2010)
Article MATH Google Scholar
Guruswami, V., Sahai, A.: Multiclass learning, boosting, and error-correcting codes. In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory, COLT 1999, pp. 145–155. ACM, New York (1999)
Google Scholar
Hua, J., Tembe, W.D., Dougherty, E.R.: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn. 42(3), 409–424 (2009)
Article MATH Google Scholar
Hunt, E.B., Marin, J., Stone, P.J.: Experiments in induction. Academic Press, New York (1966)
Google Scholar
Inbarani, H.H., Azar, A.T., Jothi, G.: Supervised hybrid feature selection based on pso and rough sets for medical diagnosis. Comput. Methods Programs Biomed. 113(1), 175–185 (2014)
Article Google Scholar
Inza, I., Larrañaga, P., Blanco, R., Cerrolaza, A.J.: Filter versus wrapper gene selection approaches in dna microarray domains. Artif. Intell. Med. 31(2), 91–103 (2004)
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Article MATH Google Scholar
Kotsiantis, S.: Feature selection for machine learning classification problems: a recent overview. Artif. Intell. Rev. 42, 1–20 (2011)
Google Scholar
Leung, Y., Hung, Y.: A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 7(1), 108–117 (2010)
Article Google Scholar
Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: TAI, p. 388. IEEE (1995)
Google Scholar
Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4), 2052–2064 (2014)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Polikar, R.: Essemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)
Article Google Scholar
Sarafrazi, S., Nezamabadi-pour, H.: Facing the classification of binary problems with a gsa-svm hybrid system. Math. Comput. Model. 57(1), 270–278 (2013)
Article MathSciNet MATH Google Scholar
Susnjak, T., Barczak, A., Reyes, N.: On combining boosting with rule-induction for automated fruit grading. In: Kim, H.K., Ao, S.-L., Amouzegar, M.A. (eds.) Transactions on Engineering Technologies, pp. 275–290. Springer, Netherlands (2014)
Google Scholar
Tsai, C.F., Hsiao, Y.C.: Combining multiple feature selection methods for stock prediction: union, intersection, and multi-intersection approaches. Decis. Support Syst. 50(1), 258–269 (2010)
Article Google Scholar
Vapnik, V.N., Vapnik, V.: Statistical Learning Theory, vol. 1. Wiley, New York (1998)
MATH Google Scholar
Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)
Google Scholar
Ye, J., Li, Q.: A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 929–941 (2005)
Article Google Scholar
Zhu, Z., Ong, Y.S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 37(1), 70–76 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Massey University, Auckland, New Zealand
Teo Susnjak, David Kerry, Andre Barczak & Napoleon Reyes
Compac Ltd., Auckland, New Zealand
Yaniv Gal

Authors

Teo Susnjak
View author publications
You can also search for this author in PubMed Google Scholar
David Kerry
View author publications
You can also search for this author in PubMed Google Scholar
Andre Barczak
View author publications
You can also search for this author in PubMed Google Scholar
Napoleon Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Yaniv Gal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Teo Susnjak .

Editor information

Editors and Affiliations

The University of Waikato, Hamilton, New Zealand
Bernhard Pfahringer
The Australian National University, Canberra, Aust Capital Terr, Australia
Jochen Renz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Susnjak, T., Kerry, D., Barczak, A., Reyes, N., Gal, Y. (2015). Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies. In: Pfahringer, B., Renz, J. (eds) AI 2015: Advances in Artificial Intelligence. AI 2015. Lecture Notes in Computer Science(), vol 9457. Springer, Cham. https://doi.org/10.1007/978-3-319-26350-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-26350-2_47
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26349-6
Online ISBN: 978-3-319-26350-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics