Abstract
Model-building under the supervised learning domain potentially face a dual learning problem of identifying both the parameters of the model and the subset of (domain) attributes necessary to support the model, thus using an embedded as opposed to wrapper or filter based design. Genetic Programming (GP) has always addressed this dual problem, however, further implicit assumptions are made which potentially increase the complexity of the resulting solutions. In this work we are specifically interested in the case of classification under very large attribute spaces. As such it might be expected that multiple independent/ overlapping attribute subspaces support the mapping to class labels; whereas GP approaches to classification generally assume a single binary classifier per class, forcing the model to provide a solution in terms of a single attribute subspace and single mapping to class labels. Supporting the more general goal is considered as a requirement for identifying a ‘team’ of classifiers with non-overlapping classifier behaviors, in which each classifier responds to different subsets of exemplars. Moreover, the subsets of attributes associated with each team member might utilize a unique ‘subspace’ of attributes. This work investigates the utility of coevolutionary model building for the case of classification problems with attribute vectors consisting of 650 to 100,000 dimensions. The resulting team based coevolutionary evolutionary method-Symbiotic Bid-based (SBB) GP-is compared to alternative embedded classifier approaches of C4.5 and Maximum Entropy Classification (MaxEnt). SSB solutions demonstrate up to an order of magnitude lower attribute count relative to C4.5 and up to two orders of magnitude lower attribute count than MaxEnt while retaining comparable or better classification performance. Moreover, relative to the attribute count of individual models participating within a team, no more than six attributes are ever utilized; adding a further level of simplicity to the resulting solutions.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asuncion, A. and Newman, D. J. (2008). UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/∼mlearn/mlrepository.html]. Irvine, CA: University of California, Dept. of Information and Comp. Science.
Bernado-Mansilla, E. and Garrell-Guiu, J.M. (2003). Accuracy-based learning classifier systems: Models, analysis and applications to classification tasks. Evolutionary Computation, 11:209–238.
Brameier, M. and Banzhaf, W. (2001). Evolving teams of predictors with linear Genetic Programming. Genetic Programming and Evolvable Machines, 2(4):381–407.
Chandra, A., Chen, H., and Yao, X. (2006). Trade-off between diversity and accuracy in ensemble generation, chapter 19, pages 429–464. In ((Jin, 2006)).
Daumè III, Hal (2004). Notes on CG and LM-BFGS optimization of logistic regression. Paper and code available at http://www.cs.utah.edu/∼hal/megam.
de Jong, E.D. (2007). A monotonic archive for pareto-coevolution. Evolutionary Computation, 15(1):61–93.
Doucette, J. and Heywood, M.I. (2008). GP Classification under Imbalanced Data Sets: Active Sub-sampling and AUC Approximation. In European Conference on Genetic Programming, volume 4971 of Lecture Notes in Computer Science, pages 266–277.
Doucette, J., McIntyre, A.R., Lichodzijewski, P., and Heywood, M. I. (2009). Problem decomposition under large feature spaces using a coevolutionary memetic algorithm. Manuscript under review.
Folino, G., Pizzuti, C., and Spezzano, G. (2006). GP ensembles for large-scale data classification. IEEE Transactions on Evolutionary Computation, 10(5):604–616.
Haffner, P. (2006). Scaling large margin classifiers for spoken language understanding. Speech Communication, 48:239–261.
Imamura, K., Soule, T., Heckendorn, R. B., and Foster, J. A. (2003). Behavioral diversity and a probabilistically optimal GP ensemble. Genetic Programming and Evolvable Machines, 4(3):235–253.
Jin, Y., editor (2006). Multi-Objective Machine Learning, volume 16 of Studies in Computational Intelligence. Spinger-Verlag.
Krawiec, K. (2002). Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery tasks. Genetic Programming and Evolvable Machines, 3(4):329–343.
Kumar, R., Joshi, A.H., Banka, K.K., and Rockett, P.I. (2008). Evolution of hyperheuristics for the biobjective 0/1 knapsack problem by multiobjective Genetic Programming. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1227–1234.
Lal, T. N., Chapelle, O., Weston, J., and Elisseeff, A. (2006). Embedded methods. In Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L.A., editors, Feature Extraction: Foundations and Applications, pages 137–165. Springer Verlag.
Lichodzijewski, P. and Heywood, M. I. (2008a). Coevolutionary bid-based Genetic Programming for problem decomposition in classification. Genetic Programming and Evolvable Machines, 9(4):331–365.
Lichodzijewski, P. and Heywood, M.I. (2008b). Managing team-based problem solving with Symbiotic Bid-based Genetic Programming. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 363–370.
McIntyre, A.R. and Heywood, M.I. (2008). Cooperative problem decomposition in Pareto competitive classifier models of coevolution. In European Conference on Genetic Programming, volume 4971 of Lecture Notes in Computer Science, pages 289–300.
More, J. H. and White, B. C. (2007). Genome-wide genetic analysis using genetic programming. In Riolo, R., Soule, T., and Worzel, B., editors, Genetic Programming Theory and Practice IV, pages 11–28. Springer Verlag.
Nigam, K., Lafferty, J., and McCallum, A. (1999). Using Maximum Entropy for Text Classification. In Workshop on Machine Learning for Information Filtering (IJCAI), pages 61–67.
Potter, M. and de Jong, K. (2000). Cooperative coevolution: An architecture for evolving coadapted subcomponents. Evolutionary Computation, 8(1):1–29.
Quinlan, Ross J. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Rosin, C. D. and Belew, R. K. (1997). New methods for competitive coevolution. Evolutionary Compuatation, 5:1–29.
Smith, M.G. and Bull, L. (2005). Genetic Programming with a Genetic Algorithm for Feature Construction and Selection. Genetic Programming and Evolvable Machines, 6(3):265–281.
Thomason, R. and Soule, T. (2007). Novel ways of improving cooperation and performance in Ensemble Classifiers. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1708–1715.
Zhang, Y. and Rockett, P.I. (2006). Feature extraction using multi-objective genetic programming, chapter 4, pages 75–99. In ((Jin, 2006)).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Doucette, J., Lichodzijewski, P., Heywood, M. (2010). Evolving Coevolutionary Classifiers Under Large Attribute Spaces. In: Riolo, R., O'Reilly, UM., McConaghy, T. (eds) Genetic Programming Theory and Practice VII. Genetic and Evolutionary Computation. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-1626-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-1626-6_3
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-1653-2
Online ISBN: 978-1-4419-1626-6
eBook Packages: Computer ScienceComputer Science (R0)