Predicting the Toxicity of Chemical Compounds Using GPTIPS: A Free Genetic Programming Toolbox for MATLAB

  • Dominic P. SearsonEmail author
  • David E. Leahy
  • Mark J. Willis
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 70)


In this contribution GPTIPS, a free, open source MATLAB toolbox for performing symbolic regression by genetic programming (GP) is introduced. GPTIPS is specifically designed to evolve mathematical models of predictor response data that are “multigene” in nature, i.e. linear combinations of low order non-linear transformations of the input variables. The functionality of GPTIPS is demonstrated by using it to generate an accurate, compact QSAR (quantitative structure activity relationship) model of existing toxicity data in order to predict the toxicity of chemical compounds. It is shown that the low-order “multigene” GP methods implemented by GPTIPS can provide a useful alternative, as well as a complementary approach, to currently accepted empirical modelling and data analysis techniques. GPTIPS and documentation is available for download at


Genetic programming Symbolic regression QSAR Toxicity 


  1. 1.
    Alfaro-Cid, E., Esparcia-Alcázar, A.I., Moya, P., Femenia-Ferrer, B., Sharman, K., Merelo, J.J.: Modeling pheromone dispensers using genetic programming. In: Lecture Notes in Computer Science, vol. 5484/2009, pp. 635–644. Springer, Berlin/Heidelberg (2009) Google Scholar
  2. 2.
    Greeff, D.J., Aldrich, C.: Empirical modeling of chemical process systems with evolutionary programming. Comp. Chem. Eng. 22, 995–1005 (1998) CrossRefGoogle Scholar
  3. 3.
    Grosman, B., Lewin, D.R.: Automated nonlinear model predictive control using genetic programming. Comp. Chem. Eng. 26, 631–640 (2002) CrossRefGoogle Scholar
  4. 4.
    Hinchliffe, M.P., Willis, M.J.: Dynamic systems modelling using genetic programming. Comp. Chem. Eng. 27(12), 1841–1854 (2003) CrossRefGoogle Scholar
  5. 5.
    Hinchliffe, M.P., Willis, M.J., Hiden, H., Tham, M.T., McKay, B., Barton, G.W.: Modelling chemical process systems using a multi-gene genetic programming algorithm. In: Genetic Programming: Proceedings of the First Annual Conference (late breaking papers), pp. 56–65. MIT Press, Cambridge (1996) Google Scholar
  6. 6.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992) Google Scholar
  7. 7.
    Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2002), 2002 Google Scholar
  8. 8.
    Madar, J., Abonyi, J., Sziefert, F.: Genetic programming for the identification of nonlinear input-output models. Ind. Eng. Chem. Res. 44, 3178–3186 (2005) CrossRefGoogle Scholar
  9. 9.
    McKay, B., Willis, M.J., Barton, G.W.: Steady-state modeling of chemical process systems using genetic programming. Comp. Chem. Eng. 21, 981–996 (1997) CrossRefGoogle Scholar
  10. 10.
    Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via and freely available at (2008)
  11. 11.
    Schultz, T.W., Yarbrough, J.W., Woldemeskel, M.: Toxicity to Tetrahymena and abiotic thiol reactivity of aromatic isothiocyanates. Cell Biol. Toxicol. 21, 181–189 (2005) CrossRefGoogle Scholar
  12. 12.
    Searson, D.P., Leahy, D.E., Willis, M.J.: GPTIPS: An open source genetic programming toolbox for multigene symbolic regression. In: Lecture Notes in Engineering and Computer Science: Proceedings of the International Multiconference of Engineers and Computer Scientists, IMECS 2010, Hong Kong, 17–19 March 2010 Google Scholar
  13. 13.
    Searson, D.P., Willis, M.J., Montague, G.A.: Co-evolution of non-linear PLS model components. J. Chemom. 2, 592–603 (2007) CrossRefGoogle Scholar
  14. 14.
    Seavey, K.C., Jones, A.T., Kordon, A.K.: Hybrid genetic programming – First-principles approach to process and product modeling. Ind. Eng., Chem. Res. 49, 2273–2285 (2010) CrossRefGoogle Scholar
  15. 15.
    Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003) CrossRefGoogle Scholar
  16. 16.
    Vapnik, V.N.: The Nature of Statistical Learning Theory, second edn. Springer, New York (2000) zbMATHCrossRefGoogle Scholar
  17. 17.
    Wang, X., Li, Y.: Synthesis of multicomponent product separation sequences via stochastic GP method. Ind. Eng. Chem. Res. 47, 8815–8822 (2008) CrossRefGoogle Scholar
  18. 18.
    Zhu, H., Tropsha, A., Fourches, D., Varnek, A., Papa, E., Gramatica, P., Oberg, T., Dao, P., Cherkasov, A., Tetko, I.V.: Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis. J. Chem. Inf. Model. 48, 766–784 (2008) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Dominic P. Searson
    • 1
    Email author
  • David E. Leahy
    • 2
  • Mark J. Willis
    • 3
  1. 1.Northern Institute for Cancer ResearchNewcastle UniversityNewcastle upon TyneUK
  2. 2.School of Chemical Engineering and Advanced MaterialsNewcastle UniversityNewcastle upon TyneUK
  3. 3.School of Chemical Engineering and Advanced MaterialsNewcastle UniversityNewcastle upon TyneUK

Personalised recommendations