Abstract
The Targeted Estimation of Distribution Algorithm (TEDA) introduces into an EDA/GA hybrid framework a ‘Targeting’ process, whereby the number of active genes, or ‘control points’, in a solution is driven in an optimal direction. For larger feature selection problems with over a thousand features, traditional methods such as forward and backward selection are inefficient. Traditional EAs may perform better but are slow to optimize if a problem is sufficiently noisy that most large solutions are equally ineffective and it is only when much smaller solutions are discovered that effective optimization may begin. By using targeting, TEDA is able to drive down the feature set size quickly and so speeds up this process. This approach was tested on feature selection problems with between 500 and 20,000 features using all of these approaches and it was confirmed that TEDA finds effective solutions significantly faster than the other approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baluja, S.: Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94-163, Computer Science Department, Carnegie Mellon University (1994)
Bo, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biol. 3(4), 1–17 (2002)
Cantu-Paz, E.: Feature subset selection by estimation of distribution algorithms. In Proceedings of Genetic and Evolutionary Computation Conference MIT Press, pp. 303-310 (2002)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Godley, P., Cairns, D., Cowie, J., McCall, J.: Fitness directed intervention crossover approaches applied to bio-scheduling problems. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp 120-127 (2008)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. Adv. Neural Inf. Process. Syst. 17, 545–552 (2004)
Inza, I., Larranaga, P., Etxeberria, R., Sierra, B.: Feature subset selection by bayesian networks based on optimization. Artif. Intell. 123(1), 157–184 (2000)
Inza, I., Larranaga, P., Sierra, B.: Feature subset selection by bayesian networks: a comparison with genetic and sequential algorithms. Int. J. Approx. Reason. 27(2), 143–164 (2001)
Keller, J., Gray, M., Givens, J.: A fuzzy k-nearest neighbor algorithm. In: IEEE Transactions on Systems, Man and Cybernetics, vol. 4, pp. 580–585 (1985)
Lai, C., Reinders, M., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognit. Lett. 27(10), 1067–1076 (2006)
Larranaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool For Evolutionary Computation, vol 2. Springer (2002)
Muhlenbein, H., Paass, G.: Recombination of genes to the estimation of distributions. PPSN, pp. 178–187. Springer, Berlin (1996)
Neumann, G., Cairns, D.: Targeted eda adapted for a routing problem with variable length chromosomes. In: IEEE Congress on Evolutionary Computation (CEC), pp. 220–225 (2012)
Neumann, G.K., Cairns, D.E.: Introducing intervention targeting into estimation of distribution algorithms. In: Proceedings of the 27th ACM Symposium on Applied Computing, pp. 334 - 341 (2012)
Pena, J., Robles, V., Larranaga, P., Herves, V., Rosales, F., Perez, M.: GA-EDA: Hybrid evolutionary algorithm using genetic and estimation of distribution algorithms. Innovations in Applied Artificial Intelligence, pp. 361–371. Springer, Berlin (2004)
Posik, P.: Preventing premature convergence in a simple eda via global step size setting. In: Proceedings of the 10th International Conference on PPSN X (2008)
Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)
Saeys, Y., Degroeve, S., Aeyels, D., de Peer, Y.V., Rouz, P.: Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction. Bioinformatics 19(suppl 2), 179–188 (2003)
Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3, 1399–1414 (2003)
Zhang, Q., Sun, J., Tsang, E.: Combinations of estimation of distribution algorithms and other techniques. Int. J. Autom. Comput. 4(3), 273–280 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Neumann, G., Cairns, D. (2016). A Targeted Estimation of Distribution Algorithm Compared to Traditional Methods in Feature Selection. In: Madani, K., Dourado, A., Rosa, A., Filipe, J., Kacprzyk, J. (eds) Computational Intelligence. IJCCI 2013. Studies in Computational Intelligence, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-23392-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-23392-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23391-8
Online ISBN: 978-3-319-23392-5
eBook Packages: EngineeringEngineering (R0)