Skip to main content

On the Relevance of Feature Selection Algorithms While Developing Non-linear QSARs

  • Protocol
  • First Online:

Part of the book series: Methods in Pharmacology and Toxicology ((MIPT))

Abstract

Quantitative structure-activity relationships (QSARs) are mathematical models aimed at finding a quantitative relationship between a set of chemical compounds and a specific activity or endpoint, such as toxicity, chemical or physical property, biological activity, and so on. In order to find out the correlation between the chemicals and the selected endpoints, QSAR models use the so-called molecular descriptors (MDs) which encode specific chemical information or features of the molecules. The early QSAR models were based on a small set of MDs and a specific endpoint, and the correlation was usually a linear mathematical correlation. However, nowadays, QSAR models are usually non-linear and made up by thousands of chemicals and hundreds of MDs. In addition, novel QSAR models are also aimed at the prediction of different endpoints with the same model, the so-called multi-target QSAR (MT-QSAR). Due to this, nowadays many QSARs are usually developed using machine learning approaches which can model a dataset with different endpoints. Although these approaches have demonstrated to be able to solve MT-QSAR models, feature selection (FS) in these cases is a challenging task and a main point in the QSAR field. Considering these aspects, the main aim of this chapter is to analyze feature selection methods while developing non-linear QSAR models.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Hansch C, Muir RM, Fujita T, Maloney PP, Geiger F, Streich M (1963) The correlation of biological activity of plant growth regulators and chloromycetin derivatives with hammett constants and partition coefficients. J Am Chem Soc 85(18):2817–2824

    Article  CAS  Google Scholar 

  2. Gombar VK, Enslein K, Blake BW (1995) Assessment of developmental toxicity potential of chemicals by quantitative structure-toxicity relationship models. Chemosphere 31(1):2499–2510

    Article  CAS  PubMed  Google Scholar 

  3. Roy K, Ghosh G (2004) QSTR with extended topochemical atom indices. 2. Fish toxicity of substituted benzenes. J Chem Inf Comput Sci 44(2):559–567

    Article  CAS  PubMed  Google Scholar 

  4. Basak SC, Nikolic S, Trinajstic N, Amic D, Beslo D (2000) QSPR modeling: graph connectivity indices versus line graph connectivity indices. J Chem Inf Comput Sci 40(4):927–933

    Article  CAS  PubMed  Google Scholar 

  5. Grover II, Singh II, Bakshi II (2000) Quantitative structure-property relationships in pharmaceutical research – part 2. Pharm Sci Technolo Today 3(2):50–57

    Article  CAS  Google Scholar 

  6. Grover II, Singh II, Bakshi II (2000) Quantitative structure-property relationships in pharmaceutical research – part 1. Pharm Sci Technolo Today 3(1):28–35

    Article  CAS  Google Scholar 

  7. Concu R, Kleandrova VV, Speck-Planche A, Cordeiro M (2017) Probing the toxicity of nanoparticles: a unified in silico machine learning model based on perturbation theory. Nanotoxicology 11(7):891–906

    Article  CAS  PubMed  Google Scholar 

  8. Burello E, Worth AP (2011) QSAR modeling of nanomaterials. Wiley Interdiscip Rev Nanomed Nanobiotechnol 3(3):298–306

    Article  CAS  PubMed  Google Scholar 

  9. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M et al (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57(12):4977–5010

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wilm A, Kuhnl J, Kirchmair J (2018) Computational approaches for skin sensitization prediction. Crit Rev Toxicol 48(9):738–760

    Article  CAS  PubMed  Google Scholar 

  11. Ford KA (2016) Refinement, reduction, and replacement of animal toxicity tests by computational methods. ILAR J 57(2):226–233

    Article  CAS  PubMed  Google Scholar 

  12. Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488

    Article  CAS  PubMed  Google Scholar 

  13. Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):D945–DD54

    Article  CAS  PubMed  Google Scholar 

  14. Kim S, Chen J, Cheng T, Gindulyte A, He J, He S et al (2019) PubChem 2019 update: improved access to chemical data. Nucleic Acids Res 47(D1):D1102–D11D9

    Article  PubMed  Google Scholar 

  15. Pence HE, Williams A (2010) ChemSpider: an online chemical information resource. J Chem Educ 87(11):1123–1124

    Article  CAS  Google Scholar 

  16. Jabeen I, Wetwitayaklung P, Chiba P, Pastor M, Ecker GF (2013) 2D- and 3D-QSAR studies of a series of benzopyranes and benzopyrano[3,4b][1,4]-oxazines as inhibitors of the multidrug transporter P-glycoprotein. J Comput Aided Mol Des 27(2):161–171

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Mauri A, Consonni V, Pavan M, Todeschini R (2006) Dragon software: an easy approach to molecular descriptor calculations. Match Commun Math Comput Chem 56(2):237–248

    CAS  Google Scholar 

  18. Sadowski J, Gasteiger J, Klebe G (1994) Comparison of automatic three-dimensional model builders using 639 x-ray structures. J Chem Inf Comput Sci 34(4):1000–1008

    Article  CAS  Google Scholar 

  19. Ignatz-Hoover F, Petrukhin R, Karelson M, Katritzky AR (2001) QSRR correlation of free-radical polymerization chain-transfer constants for styrene. J Chem Inf Comput Sci 41(2):295–299

    Article  CAS  PubMed  Google Scholar 

  20. Roy K, Pratim RP (2009) Comparative chemometric modeling of cytochrome 3A4 inhibitory activity of structurally diverse compounds using stepwise MLR, FA-MLR, PLS, GFA, G/PLS and ANN techniques. Eur J Med Chem 44(7):2913–2922

    Article  CAS  PubMed  Google Scholar 

  21. Baskin II, Palyulin VA, Zefirov NS (2008) Neural networks in building QSAR models. Methods Mol Biol 458:137–158

    PubMed  Google Scholar 

  22. Wiese M, Schaper KJ (1993) Application of neural networks in the QSAR analysis of percent effect biological data: comparison with adaptive least squares and nonlinear regression analysis. SAR QSAR Environ Res 1(2–3):137–152

    Article  CAS  PubMed  Google Scholar 

  23. Zernov VV, Balakin KV, Ivaschenko AA, Savchuk NP, Pletnev IV (2003) Drug discovery using support vector machines. The case studies of drug-likeness, agrochemical-likeness, and enzyme inhibition predictions. J Chem Inf Comput Sci 43(6):2048–2056

    Article  CAS  PubMed  Google Scholar 

  24. Li S, Fedorowicz A, Andrew ME (2007) A new descriptor selection scheme for SVM in unbalanced class problem: a case study using skin sensitisation dataset. SAR QSAR Environ Res 18(5–6):423–441

    Article  PubMed  CAS  Google Scholar 

  25. Shi W, Zhang X, Shen Q (2010) Quantitative structure-activity relationships studies of CCR5 inhibitors and toxicity of aromatic compounds using gene expression programming. Eur J Med Chem 45(1):49–54

    Article  CAS  PubMed  Google Scholar 

  26. Stoyanova-Slavova IB, Slavov SH, Pearce B, Buzatu DA, Beger RD, Wilkes JG (2014) Partial least square and k-nearest neighbor algorithms for improved 3D quantitative spectral data-activity relationship consensus modeling of acute toxicity. Environ Toxicol Chem 33(6):1271–1282

    Article  CAS  PubMed  Google Scholar 

  27. Nikolic K, Filipic S, Smolinski A, Kaliszan R, Agbaba D (2013) Partial least square and hierarchical clustering in ADMET modeling: prediction of blood-brain barrier permeation of alpha-adrenergic and imidazoline receptor ligands. J Pharm Pharm Sci 16(4):622–647

    Article  PubMed  Google Scholar 

  28. Brandmaier S, Sahlin U, Tetko IV, Oberg T (2012) PLS-optimal: a stepwise D-optimal design based on latent variables. J Chem Inf Model 52(4):975–983

    Article  CAS  PubMed  Google Scholar 

  29. Koba M, Baczek T (2013) The evaluation of multivariate adaptive regression splines for the prediction of antitumor activity of acridinone derivatives. Med Chem 9(8):1041–1050

    Article  CAS  PubMed  Google Scholar 

  30. Put R, Xu QS, Massart DL, Vander HY (2004) Multivariate adaptive regression splines (MARS) in chromatographic quantitative structure-retention relationship studies. J Chromatogr A 1055(1–2):11–19

    Article  CAS  PubMed  Google Scholar 

  31. Scior T, Medina-Franco JL, Do QT, Martinez-Mayorga K, Yunes Rojas JA, Bernard P (2009) How to recognize and workaround pitfalls in QSAR studies: a critical review. Curr Med Chem 16(32):4297–4313

    Article  CAS  PubMed  Google Scholar 

  32. Gramatica P (2013) On the development and validation of QSAR models. Methods Mol Biol 930:499–526

    Article  CAS  PubMed  Google Scholar 

  33. Basak SC, Natarajan R, Mills D, Hawkins DM, Kraker JJ (2006) Quantitative structure-activity relationship modeling of juvenile hormone mimetic compounds for Culex pipiens larvae, with a discussion of descriptor-thinning methods. J Chem Inf Model 46(1):65–77

    Article  CAS  PubMed  Google Scholar 

  34. Khan PM, Roy K (2018) Current approaches for choosing feature selection and learning algorithms in quantitative structure-activity relationships (QSAR). Expert Opin Drug Dis 13(12):1075–1089

    Article  CAS  Google Scholar 

  35. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E et al (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48(9):1733–1746

    Article  CAS  PubMed  Google Scholar 

  36. Topliss JG (1972) Utilization of operational schemes for analog synthesis in drug design. J Med Chem 15(10):1006–1011

    Article  CAS  PubMed  Google Scholar 

  37. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1):131–156

    Article  Google Scholar 

  38. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. Proceedings of the tenth national conference on artificial intelligence, San Jose, 1867155, AAAI Press, pp 129–134

    Google Scholar 

  39. Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26(9):917–922

    Article  Google Scholar 

  40. Koller D, Sahami M (1996) Toward optimal feature selection. Proceedings of the thirteenth international conference on machine learning, Bari, 3091731, Morgan Kaufmann Publishers Inc., pp 284–292

    Google Scholar 

  41. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1):155–176

    Article  Google Scholar 

  42. Arauzo-Azofra A, Benitez JM, Castro JL (2008) Consistency measures for feature selection. J Intell Inf Syst 30(3):273–292

    Article  Google Scholar 

  43. Jun BH, Kim CS, Song H, Kim J (1997) A new criterion in selection and discretization of attributes for the generation of decision trees. IEEE Trans Pattern Anal Mach Intell 19(12):1371–1375

    Article  Google Scholar 

  44. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comp Electr Eng 40(1):16–28

    Article  Google Scholar 

  45. Piramuthu S (2004) Evaluating feature selection methods for learning in data mining applications. Eur J Oper Res 156(2):483–494

    Article  Google Scholar 

  46. Whitley DC, Ford MG, Livingstone DJ (2000) Unsupervised forward selection: a method for eliminating redundant variables. J Chem Inf Comput Sci 40(5):1160–1168

    Article  CAS  PubMed  Google Scholar 

  47. Sutter JM, Kalivas JH (1993) Comparison of forward selection, backward elimination, and generalized simulated annealing for variable selection. Microchem J 47(1):60–66

    Article  CAS  Google Scholar 

  48. Livingstone DJ, Salt DW (2005) Variable selection—Spoilt for choice? Reviews in Computational Chemistry, Ed. Lipkowitz KB, Larter R, Cundari TR, John Wiley & Sons, Inc., chap.4, vol 21, pp. 287–348

    Google Scholar 

  49. Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. Proceedings of the ninth National conference on Artificial intelligence, vol 2, Anaheim, 1865761, AAAI Press, pp 547–552

    Google Scholar 

  50. Almuallim H, Dietterich TG (1994) Learning Boolean concepts in the presence of many irrelevant features. Artif Intell 69(1):279–305

    Article  Google Scholar 

  51. Arauzo A, Benítez JM, Castro JL (eds) C-FOCUS: a continuous extension of FOCUS2003. Springer, London

    Google Scholar 

  52. Tay FEH, Lixiang S (2002) A modified Chi2 algorithm for discretization. IEEE Trans Knowl Data Eng 14(3):666–670

    Article  Google Scholar 

  53. Boros E, Hammer PL, Ibaraki T, Kogan A, Mayoraz E, Muchnik I (2000) An implementation of logical analysis of data. IEEE Trans Knowl Data Eng 12(2):292–306

    Article  Google Scholar 

  54. Demšar J, Zupan B, Leban G, Curk T (eds) Orange: from experimental machine learning to interactive data mining 2004. Springer Berlin Heidelberg, Berlin, Heidelberg

    Google Scholar 

  55. Bell DA, Wang H (2000) A formalism for relevance and its application in feature subset selection. Mach Learn 41(2):175–195

    Article  Google Scholar 

  56. Cardie C (1993) Using decision trees to improve case-based learning, in machine learning proceedings. Morgan Kaufmann, San Francisco (CA), pp 25–32

    Chapter  Google Scholar 

  57. Hanchuan P, Fuhui L, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  58. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511

    Article  CAS  PubMed  Google Scholar 

  59. Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158

    Article  Google Scholar 

  60. Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P et al (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24:227

    Article  CAS  PubMed  Google Scholar 

  61. Ding C, Peng H (eds) (2003) Minimum redundancy feature selection from microarray gene expression data. Computational systems bioinformatics CSB2003 proceedings of the 2003 IEEE bioinformatics conference CSB2003, 11–14 Aug 2003

    Google Scholar 

  62. Claypo N, Jaiyen S (eds) (2015) A new feature selection based on class dependency and feature dissimilarity. 2015 2nd international conference on advanced informatics: concepts, theory and applications (ICAICTA), 19–22 Aug 2015

    Google Scholar 

  63. Yu-Shuen T, Ueng-Cheng Y, Chung IF, Chuen-Der H (eds) (2013) A comparison of mutual and fuzzy-mutual information-based feature selection strategies. 2013 IEEE international conference on fuzzy systems (FUZZ-IEEE), 7–10 July 2013

    Google Scholar 

  64. Cheng Q, Zhou H, Cheng J (2011) The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data 2011, pp 1217–1233

    Google Scholar 

  65. Aalaei S, Shahraki H, Rowhanimanesh A, Eslami S (2016) Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets. Iran J Basic Med Sci 19(5):476–482

    PubMed  PubMed Central  Google Scholar 

  66. Fukunaga K (1990) Chapter 10 – feature extraction and linear mapping for classification. In: Fukunaga K (ed) Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston, pp 441–507

    Chapter  Google Scholar 

  67. Fukunaga K (1990) Chapter 9 – feature extraction and linear mapping for signal representation. In: Fukunaga K (ed) Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston, pp 399–440

    Chapter  Google Scholar 

  68. Choi E, Lee C (2003) Feature extraction based on the Bhattacharyya distance. Pattern Recogn 36(8):1703–1709

    Article  Google Scholar 

  69. Drotár P, Gazda J, Smékal Z (2015) An experimental comparison of feature selection methods on two-class biomedical datasets. Comput Biol Med 66:1–10

    Article  PubMed  Google Scholar 

  70. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1):389–422

    Article  Google Scholar 

  71. Akhlaghi Y, Kompany-Zareh M (2006) Application of radial basis function networks and successive projections algorithm in a QSAR study of anti-HIV activity for a large group of HEPT derivatives. J Chemom 20(1–2):1–12

    Article  CAS  Google Scholar 

  72. Shanableh T, Assaleh K (2010) Feature modeling using polynomial classifiers and stepwise regression. Neurocomputing 73(10):1752–1759

    Article  Google Scholar 

  73. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324

    Article  Google Scholar 

  74. Naseriparsa M, Bidgoli A-M, Varaee T (2013) A hybrid feature selection method to improve performance of a group of classification algorithms. CoRR;abs/1403.2372

    Google Scholar 

  75. Nicolotti O, Carotti A (2006) QSAR and QSPR studies of a highly structured physicochemical domain. J Chem Inf Model 46(1):264–276

    Article  CAS  PubMed  Google Scholar 

  76. Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Liu H, Motoda H (eds) Feature extraction, construction and selection: a data mining perspective. Springer US, Boston, pp 117–136

    Chapter  Google Scholar 

  77. Wang XZ, Buontempo FV, Young A, Osborn D (2006) Induction of decision trees using genetic programming for modelling ecotoxicity data: adaptive discretization of real-valued endpoints. SAR QSAR Environ Res 17(5):451–471

    Article  CAS  PubMed  Google Scholar 

  78. Fjell CD, Jenssen H, Cheung WA, Hancock RE, Cherkasov A (2011) Optimization of antibacterial peptides by genetic algorithms and cheminformatics. Chem Biol Drug Des 77(1):48–56

    Article  CAS  PubMed  Google Scholar 

  79. Kumar M, Husain M, Upreti N, Gupta D (2010) Genetic algorithm: review and application. IJITM 2(2):451–454

    Google Scholar 

  80. Weile DS, Michielssen E (1997) Genetic algorithm optimization applied to electromagnetics: a review. IEEE Trans Antennas Propag 45(3):343–353

    Article  Google Scholar 

  81. Hopper E, Turton B (eds) (1998) Application of genetic algorithms to packing problems — a review. Springer, London

    Google Scholar 

  82. Hussein F, Kharma N, Ward R (eds) (2001) Genetic algorithms for feature selection and weighting, a review and study. Proceedings of Sixth International Conference on Document Analysis and Recognition. 13 Sept 2001

    Google Scholar 

  83. Leardi R (2001) Genetic algorithms in chemometrics and chemistry: a review. J Chemom 15(7):559–569

    Article  CAS  Google Scholar 

  84. Fernandez M, Caballero J, Fernandez L, Sarai A (2011) Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM). Mol Divers 15(1):269–289

    Article  CAS  PubMed  Google Scholar 

  85. Niculescu SP (2003) Artificial neural networks and genetic algorithms in QSAR. J Mol Struct THEOCHEM 622(1):71–83

    Article  CAS  Google Scholar 

  86. Venkatraman V, Dalby AR, Yang ZR (2004) Evaluation of mutual information and genetic programming for feature selection in QSAR. J Chem Inf Comput Sci 44(5):1686–1692

    Article  CAS  PubMed  Google Scholar 

  87. Zhou A, Qu B-Y, Li H, Zhao S-Z, Suganthan PN, Zhang Q (2011) Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evolutionary Comput 1(1):32–49

    Article  Google Scholar 

  88. Ozdemir M, Embrechts MJ, Arciniegas F, Breneman CM, Lockwood L, Bennett KP (eds) (2001) Feature selection for in-silico drug design using genetic algorithms and neural networks. SMCia/01 proceedings of the 2001 IEEE mountain workshop on soft computing in industrial applications (Cat No01EX504), 27 June 2001

    Google Scholar 

  89. Bahmani A, Saaidpour S, Rostami A (2017) Quantitative structure–retention relationship modeling of morphine and its derivatives on OV-1 column in gas–liquid chromatography using genetic algorithm. Chromatographia 80(4):629–636

    Article  CAS  Google Scholar 

  90. Mizera M, Krause A, Zalewski P, Skibiński R, Cielecka-Piontek J (2017) Quantitative structure-retention relationship model for the determination of naratriptan hydrochloride and its impurities based on artificial neural networks coupled with genetic algorithm. Talanta 164:164–174

    Article  CAS  PubMed  Google Scholar 

  91. Ghasemi G, Nirouei M, Shariati S, Abdolmaleki P, Rastgoo Z (2016) A quantitative structure–activity relationship study on HIV-1 integrase inhibitors using genetic algorithm, artificial neural networks and different statistical methods. Arab J Chem 9:S185–SS90

    Article  CAS  Google Scholar 

  92. Velásco-Mejía A, Vallejo-Becerra V, Chávez-Ramírez AU, Torres-González J, Reyes-Vidal Y, Castañeda-Zaldivar F (2016) Modeling and optimization of a pharmaceutical crystallization process by using neural networks and genetic algorithms. Powder Technol 292:122–128

    Article  CAS  Google Scholar 

  93. Li Y, Abbaspour MR, Grootendorst PV, Rauth AM, Wu XY (2015) Optimization of controlled release nanoparticle formulation of verapamil hydrochloride using artificial neural networks with genetic algorithm and response surface methodology. Eur J Pharm Biopharm 94:170–179

    Article  CAS  PubMed  Google Scholar 

  94. Noorizadeh H, Farmany A, Noorizadeh M (2011) Application of GA–KPLS and L–M ANN calculations for the prediction of the capacity factor of hazardous psychoactive designer drugs. Med Chem Res 21:2680–2688

    Article  CAS  Google Scholar 

  95. Sukumar N, Prabhu G, Saha P (2014) Applications of genetic algorithms in QSAR/QSPR modeling. In: Valadi J, Siarry P (eds) Applications of metaheuristics in process engineering. Springer International Publishing, Cham, pp 315–324

    Google Scholar 

  96. Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern B Cybern 26(1):29–41

    Article  CAS  PubMed  Google Scholar 

  97. Mullen RJ, Monekosso D, Barman S, Remagnino P (2009) A review of ant algorithms. Expert Syst Appl 36(6):9608–9617

    Article  Google Scholar 

  98. Goodarzi M, Freitas MP, Jensen R (2009) Feature selection and linear/nonlinear regression methods for the accurate prediction of glycogen synthase kinase-3 beta inhibitory activities. J Chem Inf Model 49(4):824–832

    Article  CAS  PubMed  Google Scholar 

  99. Niu B, Lu W-C, Yang S-S, Cai Y-D, Li G-Z (2007) Support vector machine for SAR/QSAR of phenethyl-amines1. Acta Pharmacol Sin 28(7):1075–1086

    Article  CAS  PubMed  Google Scholar 

  100. Embrechts MJ, Arciniegas F, Ozdemir M, Breneman CM, Bennett K, Lockwood L (eds) (2001) Bagging neural network sensitivity analysis for feature reduction for in-silico drug design. IJCNN’01 international joint conference on neural networks proceedings (Cat No01CH37222), 15–19 July 2001

    Google Scholar 

  101. Tanabe K, Kurita T, Nishida K, Lučić B, Amić D, Suzuki T (2013) Improvement of carcinogenicity prediction performances based on sensitivity analysis in variable selection of SVM models. SAR QSAR Environ Res 24(7):565–580

    Article  CAS  PubMed  Google Scholar 

  102. Kennedy J, Eberhart R (eds) (1995) Particle swarm optimization. Proceedings of ICNN’95 – international conference on neural networks. 27 Nov–1 Dec. 1995

    Google Scholar 

  103. Agrafiotis DK, Cedeño W (2002) Feature selection for structure−activity correlation using binary particle swarms. J Med Chem 45(5):1098–1107

    Article  CAS  PubMed  Google Scholar 

  104. Wang Z, Durst GL, Eberhart RC, Boyd DB, Miled ZB (eds) Particle swarm optimization and neural network application for QSAR. 18th international parallel and distributed processing symposium, 2004 proceedings, 26–30 Apr 2004

    Google Scholar 

  105. Xue Y, Li ZR, Yap CW, Sun LZ, Chen X, Chen YZ (2004) Effect of molecular descriptor feature selection in support vector machine classification of pharmacokinetic and toxicological properties of chemical agents. J Chem Inf Comput Sci 44(5):1630–1638

    Article  CAS  PubMed  Google Scholar 

  106. Soto AJ, Cecchini RL, Vazquez GE, Ponzoni I (2009) Multi-objective feature selection in QSAR using a machine learning approach. QSAR Comb Sci 28(11–12):1509–1523

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This work received financial support from Fundaçao para a Ciência e a Tecnologia (FCT/MEC) through national funds and co-financed by the European Union (FEDER funds) under the Partnership Agreement PT2020, through projects UID/QUI/ 50006/2013, POCI/01/0145/FEDER/007265, NORTE-01-0145-FEDER-000011 (LAQV@REQUIMTE), and the Interreg SUDOE NanoDesk (SOE1/P1/E0215; UP). RC acknowledges also FCT and the European Social Fund for financial support (Grant SFRH/BPD/80605/2011). To all financing sources, the authors are greatly indebted.

The authors declare no competing financial interest.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Riccardo Concu or M. Natália Dias Soeiro Cordeiro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Concu, R., Cordeiro, M.N.D.S. (2020). On the Relevance of Feature Selection Algorithms While Developing Non-linear QSARs. In: Roy, K. (eds) Ecotoxicological QSARs. Methods in Pharmacology and Toxicology. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0150-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0150-1_8

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0149-5

  • Online ISBN: 978-1-0716-0150-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics