Advertisement

Comparison of AdaBoost and Genetic Programming for Combining Neural Networks for Drug Discovery

  • W. B. Langdon
  • S. J. Barrett
  • B. F. Buxton
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2611)

Abstract

Genetic programming (GP) based data fusion and AdaBoost can both improve in vitro prediction of Cytochrome P450 activity by combining artificial neural networks (ANN). Pharmaceutical drug design data provided by high throughput screening (HTS) is used to train many base ANN classifiers. In data mining (KDD) we must avoid over fitting. The ensembles do extrapolate from the training data to other unseen molecules. I.e. they predict inhibition of a P450 enzyme by compounds unlike the chemicals used to train them. Thus the models might provide in silico screens of virtual chemicals as well as physical ones from Glaxo SmithKline (GSK)’s cheminformatics database. The receiver operating characteristics (ROC) of boosted and evolved ensemble are given.

Keywords

Receiver Operating Characteristic Convex Hull Genetic Programming Receiver Operating Characteristic Receiver Operating Characteristic Curve 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angeline, 1998.
    Peter J. Angeline. Multiple interacting programs: A representation for evolving complex behaviors. Cybernetics and Systems, 29(8):779–806, November 1998.zbMATHCrossRefGoogle Scholar
  2. Binmore, 1990.
    Ken Binmore. Fun and Games. D. C. Heath, Lexington, MA, USA, 1990.Google Scholar
  3. Breiman, 1996.
    Leo Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.zbMATHMathSciNetGoogle Scholar
  4. Chawla et al., 2002._N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, 2002.zbMATHGoogle Scholar
  5. Freund and Schapire, 1996.
    Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the thirteenth International Conference, pages 148–156. Morgan Kaufmann, 1996.Google Scholar
  6. Gathercole and Ross, 1994.
    Chris Gathercole and Peter Ross. Dynamic training subset selection for supervised learning in genetic programming. In Yuval Davidor, Hans-Paul Schwefel, and Reinhard Männer, editors, Parallel Problem Solving from Nature III, volume 866 of LNCS, pages 312–321, Jerusalem, 9-14 October 1994. Springer-Verlag.Google Scholar
  7. Gunatilaka and Baertlein, 2001.
    Ajith H. Gunatilaka and Brian A. Baertlein. Featurelevel and decision level fusion of noncoincidently sampled sensors for land mine detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):577–589, June 2001.CrossRefGoogle Scholar
  8. Jacobs et al., 1991._Robert A. Jacobs, Michael I. Jordon, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3:79–87, 1991.CrossRefGoogle Scholar
  9. Jones, 1998.
    Gareth Jones. Genetic and evolutionary algorithms. In Paul von Rague, editor, Encyclopedia of Computational Chemistry. John Wiley and Sons, 1998.Google Scholar
  10. Kittler and Roli, 2001.
    Josef Kittler and Fabio Roli, editors. Second International Conference on Multiple Classifier Systems, volume 2096 of LNCS, Cambridge, 2–4 July 2001. Springer Verlag.Google Scholar
  11. Kordon and Smits, 2001.
    Arthur K. Kordon and Guido F. Smits. Soft sensor development using genetic programming. In Lee Spector et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 1346–1351, San Francisco, California, USA, 7-11 July 2001. Morgan Kaufmann.Google Scholar
  12. Kupinski and Anastasio, 1999.
    M. A. Kupinski and M. A. Anastasio. Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves. IEEE Transactions on Medical Imaging, 18(8):675–685, Aug 1999.CrossRefGoogle Scholar
  13. Kupinski et al., 2000._Matthew A. Kupinski, Mark A. Anastasio, and Maryellem L. Giger. Multiobjective genetic optimization of diagnostic classifiers used in the computerized detection of mass lesions in mammography. In Kenneth M. Hanson, editor, SPIE Medical Imaging Conference, volume 3979, San Diego, California, 2000.Google Scholar
  14. Langdon and Buxton, 2001a.
    W. B. Langdon and B. F. Buxton. Genetic programming for combining classifiers. In Lee Spector et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 66–73, San Francisco, California, USA, 7-11 July 2001. Morgan Kaufmann.Google Scholar
  15. Langdon and Buxton, 2001b.
    W. B. Langdon and B. F. Buxton. Genetic programming for improved receiver operating characteristics. In Josef Kittler and Fabio Roli, editors, Second International Conference on Multiple Classifier System, volume 2096 of LNCS, pages 68–77, Cambridge, 2-4 July 2001. Springer Verlag.Google Scholar
  16. Langdon and Buxton, 2001c.
    William B. Langdon and Bernard F. Buxton. Evolving receiver operating characteristics for data fusion. In Julian F. Miller et al., editors, Genetic Programming, Proceedings of EuroGP’2001, volume 2038 of LNCS, pages 87–96, Lake Como, Italy, 18-20 April 2001. Springer-Verlag.Google Scholar
  17. Langdon et al., 1999._William B. Langdon, Terry Soule, Riccardo Poli, and James A. Foster. The evolution of size and shape. In Lee Spector, William B. Langdon, UnaMay O’Reilly, and Peter J. Angeline, editors, Advances in Genetic Programming 3, chapter 8, pages 163–190. MIT Press, 1999.Google Scholar
  18. Langdon et al., 2001._W. B. Langdon, S. J. Barrett, and B. F. Buxton. Genetic programming for combining neural networks for drug discovery. In Rajkumar Roy et al., editors, Soft Computing and Industry Recent Applications, pages 597–608. Springer-Verlag, 10-24 September 2001. Published 2002.Google Scholar
  19. Langdon et al., 2002._William B. Langdon, S. J. Barrett, and B. F. Buxton. Combining decision trees and neural networks for drug discovery. In James A. Foster et al., editors, Genetic Programming, Proceedings of the 5th European Conference, EuroGP 2002, volume 2278 of LNCS, pages 60–70, Kinsale, Ireland, 3-5 April 2002. Springer-Verlag.Google Scholar
  20. Langdon, 1998.
    William B. Langdon. Genetic Programming and Data Structures. Kluwer, 1998.Google Scholar
  21. Langdon, 2000.
    William B. Langdon. Size fair and homologous tree genetic programming crossovers. Genetic Programming and Evolvable Machines, 1(1/2):95–119, April 2000.zbMATHCrossRefGoogle Scholar
  22. Opitz and Shavlik, 1996.
    David W. Opitz and Jude W. Shavlik. Actively searching for an effective neural-network ensemble. Connection Science, 8(3–4):337–353, 1996.CrossRefGoogle Scholar
  23. Provost and Fawcett, 2001.
    Foster Provost and Tom Fawcett. Robust classification for imprecise environments. Machine Learning, 42(3):203–231, March 2001.zbMATHCrossRefGoogle Scholar
  24. Schwenk and Bengio, 2000.
    Holger Schwenk and Yoshua Bengio. Boosting neural networks. Neural Computation, 12(8):1869–1887, 2000.CrossRefGoogle Scholar
  25. Scott et al., 1998._M. J. J. Scott, M. Niranjan, and R. W. Prager. Realisable classifiers: Improving operating performance on variable cost problems. In Paul H. Lewis and Mark S. Nixon, editors, Proceedings of the Ninth British Machine Vision Conference, volume 1, pages 304–315, University of Southampton, UK, 14-17 September 1998.Google Scholar
  26. Soule, 1999.
    Terence Soule. Voting teams: A cooperative approach to non-typical problems using genetic programming. In Wolfgang Banzhaf et al., editors, Proceedings of the Genetic and Evolutionary Computation Conference, volume 1, pages 916–922, Orlando, Florida, USA, 13-17 July 1999. Morgan Kaufmann.Google Scholar
  27. Swets et al., 2000._John A. Swets, Robyn M. Dawes, and John Monahan. Better decisions through science. Scientific American, 283(4):70–75, October 2000.CrossRefGoogle Scholar
  28. Turney, 1995.
    Peter D. Turney. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2:369–409, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • W. B. Langdon
    • 1
  • S. J. Barrett
    • 1
  • B. F. Buxton
    • 2
  1. 1.Data Exploration Sciences, GlaxoSmithKline, Research and DevelopmentGreenford, MiddlesexUK
  2. 2.Computer ScienceUniversity CollegeLondonUK

Personalised recommendations