Comparison of AdaBoost and Genetic Programming for Combining Neural Networks for Drug Discovery
Genetic programming (GP) based data fusion and AdaBoost can both improve in vitro prediction of Cytochrome P450 activity by combining artificial neural networks (ANN). Pharmaceutical drug design data provided by high throughput screening (HTS) is used to train many base ANN classifiers. In data mining (KDD) we must avoid over fitting. The ensembles do extrapolate from the training data to other unseen molecules. I.e. they predict inhibition of a P450 enzyme by compounds unlike the chemicals used to train them. Thus the models might provide in silico screens of virtual chemicals as well as physical ones from Glaxo SmithKline (GSK)’s cheminformatics database. The receiver operating characteristics (ROC) of boosted and evolved ensemble are given.
KeywordsReceiver Operating Characteristic Convex Hull Genetic Programming Receiver Operating Characteristic Receiver Operating Characteristic Curve
Unable to display preview. Download preview PDF.
- Angeline, 1998.
- Binmore, 1990.Ken Binmore. Fun and Games. D. C. Heath, Lexington, MA, USA, 1990.Google Scholar
- Breiman, 1996.
- Freund and Schapire, 1996.Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the thirteenth International Conference, pages 148–156. Morgan Kaufmann, 1996.Google Scholar
- Gathercole and Ross, 1994.Chris Gathercole and Peter Ross. Dynamic training subset selection for supervised learning in genetic programming. In Yuval Davidor, Hans-Paul Schwefel, and Reinhard Männer, editors, Parallel Problem Solving from Nature III, volume 866 of LNCS, pages 312–321, Jerusalem, 9-14 October 1994. Springer-Verlag.Google Scholar
- Gunatilaka and Baertlein, 2001.
- Jones, 1998.Gareth Jones. Genetic and evolutionary algorithms. In Paul von Rague, editor, Encyclopedia of Computational Chemistry. John Wiley and Sons, 1998.Google Scholar
- Kittler and Roli, 2001.Josef Kittler and Fabio Roli, editors. Second International Conference on Multiple Classifier Systems, volume 2096 of LNCS, Cambridge, 2–4 July 2001. Springer Verlag.Google Scholar
- Kordon and Smits, 2001.Arthur K. Kordon and Guido F. Smits. Soft sensor development using genetic programming. In Lee Spector et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 1346–1351, San Francisco, California, USA, 7-11 July 2001. Morgan Kaufmann.Google Scholar
- Kupinski and Anastasio, 1999.
- Kupinski et al., 2000._Matthew A. Kupinski, Mark A. Anastasio, and Maryellem L. Giger. Multiobjective genetic optimization of diagnostic classifiers used in the computerized detection of mass lesions in mammography. In Kenneth M. Hanson, editor, SPIE Medical Imaging Conference, volume 3979, San Diego, California, 2000.Google Scholar
- Langdon and Buxton, 2001a.W. B. Langdon and B. F. Buxton. Genetic programming for combining classifiers. In Lee Spector et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 66–73, San Francisco, California, USA, 7-11 July 2001. Morgan Kaufmann.Google Scholar
- Langdon and Buxton, 2001b.W. B. Langdon and B. F. Buxton. Genetic programming for improved receiver operating characteristics. In Josef Kittler and Fabio Roli, editors, Second International Conference on Multiple Classifier System, volume 2096 of LNCS, pages 68–77, Cambridge, 2-4 July 2001. Springer Verlag.Google Scholar
- Langdon and Buxton, 2001c.William B. Langdon and Bernard F. Buxton. Evolving receiver operating characteristics for data fusion. In Julian F. Miller et al., editors, Genetic Programming, Proceedings of EuroGP’2001, volume 2038 of LNCS, pages 87–96, Lake Como, Italy, 18-20 April 2001. Springer-Verlag.Google Scholar
- Langdon et al., 1999._William B. Langdon, Terry Soule, Riccardo Poli, and James A. Foster. The evolution of size and shape. In Lee Spector, William B. Langdon, UnaMay O’Reilly, and Peter J. Angeline, editors, Advances in Genetic Programming 3, chapter 8, pages 163–190. MIT Press, 1999.Google Scholar
- Langdon et al., 2001._W. B. Langdon, S. J. Barrett, and B. F. Buxton. Genetic programming for combining neural networks for drug discovery. In Rajkumar Roy et al., editors, Soft Computing and Industry Recent Applications, pages 597–608. Springer-Verlag, 10-24 September 2001. Published 2002.Google Scholar
- Langdon et al., 2002._William B. Langdon, S. J. Barrett, and B. F. Buxton. Combining decision trees and neural networks for drug discovery. In James A. Foster et al., editors, Genetic Programming, Proceedings of the 5th European Conference, EuroGP 2002, volume 2278 of LNCS, pages 60–70, Kinsale, Ireland, 3-5 April 2002. Springer-Verlag.Google Scholar
- Langdon, 1998.William B. Langdon. Genetic Programming and Data Structures. Kluwer, 1998.Google Scholar
- Langdon, 2000.
- Opitz and Shavlik, 1996.
- Provost and Fawcett, 2001.
- Schwenk and Bengio, 2000.
- Scott et al., 1998._M. J. J. Scott, M. Niranjan, and R. W. Prager. Realisable classifiers: Improving operating performance on variable cost problems. In Paul H. Lewis and Mark S. Nixon, editors, Proceedings of the Ninth British Machine Vision Conference, volume 1, pages 304–315, University of Southampton, UK, 14-17 September 1998.Google Scholar
- Soule, 1999.Terence Soule. Voting teams: A cooperative approach to non-typical problems using genetic programming. In Wolfgang Banzhaf et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference, volume 1, pages 916–922, Orlando, Florida, USA, 13-17 July 1999. Morgan Kaufmann.Google Scholar
- Turney, 1995.Peter D. Turney. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2:369–409, 1995.Google Scholar