A Machine Learning-Based QSAR Model for Benzimidazole Derivatives as Corrosion Inhibitors by Incorporating Comprehensive Feature Selection

  • Youquan LiuEmail author
  • Yanzhi GuoEmail author
  • Wengang Wu
  • Ying Xiong
  • Chuan Sun
  • Li Yuan
  • Menglong Li
Original research article



Computational prediction of inhibition efficiency (IE) for inhibitor molecules is a crucial supplementary way to design novel molecules that can efficiently inhibit corrosion onto metallic surfaces.


Here we are dedicated to developing a new machine learning-based predictor for the inhibition efficiency (IE) of benzimidazole derivatives.


First, a comprehensively numerical representation was given on inhibitor molecules from all aspects of energy, electronic, topological, physicochemical and spatial properties based on 3-D structures and 150 valid structural descriptors were obtained. Then, a thorough investigation of these structural descriptors was implemented. The multicollinearity-based clustering analysis was performed to remove the linear correlated feature variables, so 47 feature clusters were produced. Meanwhile, Gini importance by random forest (RF) was used to further measure the contributions of the descriptors in each cluster and 47 non-linear descriptors were selected with the highest Gini importance score in the corresponding cluster. Further, considering the limited number of available inhibitors, different feature subsets were constructed according to the Gini importance score ranking list of 47 descriptors.


Finally, support vector machine (SVM) models based on different feature subsets were tested by leave-one-out cross validation. Through comparisons, the optimal SVM model with the top 11 descriptors was achieved based on Poly kernel. This model yields a promising performance with the correlation coefficient (R) and root-mean-square error (RMSE) of 0.9589 and 4.45, respectively, which indicates that the method proposed by us gives the best performance for the current data.


Based on our model, 6 new benzimidazole molecules were designed and their IE values predicted by this model indicate that two of them have high potential as outstanding corrosion inhibitors.


Benzimidazole derivatives Inhibition efficiency (IE) Machine learning methods Feature extraction and selection 



This work was financially supported by Major Science and Technology Project of China National Petroleum Co. Ltd (No.: 2016E − 0609). We also thank the Comprehensive Training Platform of Specialized Laboratory, College of Chemistry, Sichuan University for sample analysis.

Compliance with ethical standards

Conflict of interest

The authors declare no competing financial interests.

Supplementary material

12539_2019_346_MOESM1_ESM.docx (27 kb)
Supplementary material 1 (DOCX 27 kb)


  1. 1.
    Mikhailovskii AI, Petrov NA (1997) Monitoring of underground pipeline corrosion condition with sensory instruments. Prot Met 33:293–295Google Scholar
  2. 2.
    Panchenko YM, Marshakov AI, Igonin TN, Kovtanyuk VV, Nikolaeva LA (2014) Long-term forecast of corrosion mass losses of technically important metals in various world regions using a power function. Corros Sci 88:306–316CrossRefGoogle Scholar
  3. 3.
    Yıldız R (2015) An electrochemical and theoretical evaluation of 4,6-diamino-2-pyrimi-dinethiol as a corrosion inhibitor for mild steel in HCl solutions. Corros Sci 90:544–553CrossRefGoogle Scholar
  4. 4.
    Spahr S, Huntscha S, Bolotin MP, Maier J, Elsner M, Hollender J (2013) Compound-specific isotope analysis of benzotriazole and its derivatives. Anal Bioanal Chem 405:2843–2856PubMedCrossRefGoogle Scholar
  5. 5.
    Abd EAEE, Abd EWS, Farouk A, Abd EHSM (2013) Factors affecting the corrosion behaviour of aluminium in acid solutions. II. Inorganic additives as corrosion inhibitors for Al in HCl solutions. Corros Sci 68:14–24CrossRefGoogle Scholar
  6. 6.
    Rincón Ortíz M, Rodríguez MA, Carranza RM, Rebak RB (2013) Oxyanions as inhibitors of chloride-induced crevice corrosion of Alloy 22. Corros Sci 68:72–83CrossRefGoogle Scholar
  7. 7.
    Obot IB, Macdonald D, Gasem ZM (2015) Density functional theory (DFT) as a powerful tool for designing new organic corrosion inhibitors. Part 1: an overview. Corros Sci 99:1–30CrossRefGoogle Scholar
  8. 8.
    Behzadi H, Roonasi P, Momeni MJ, Manzetti S, Esrafili MD, Obot IB, Yousefv M, Mousavi-Khoshdel SM (2015) A DFT study of pyrazine derivatives and their Fe complexes in corrosion inhibition process. J Mol Struct 1086:64–72CrossRefGoogle Scholar
  9. 9.
    Obot IB, Umoren SA, Gasem ZM, Suleiman R, Ali BE (2015) Theoretical Prediction and electrochemical evaluation of vinylimidazo-line and allylimidazoline as corrosion inhibitors for mild steel in 1 M HCl. J Ind Eng Chem 21:1328–1339CrossRefGoogle Scholar
  10. 10.
    Kabanda MM, Obot IB, Ebenso EE (2013) Computational study of some amino acid derivatives as potential corrosion inhibitors for different metal surfaces and in different media. Int J Electrochem Sci 8:10839–10850Google Scholar
  11. 11.
    Gómez B, Likhanova N, Dominguez M, Aguilar O, Hallen J, Martínez-Magadán J (2005) Theoretical study of a new group of corrosion inhibitors. J Phys Chem A 109:8950–8957PubMedCrossRefGoogle Scholar
  12. 12.
    Kanojia R, Singh G (2005) An interesting and efficient organic corrosion inhibitor for mild steel in acidic medium. Surf Eng 21:180–186CrossRefGoogle Scholar
  13. 13.
    Umoren S (2009) Polymers as corrosion inhibitors formetals in different media-a review. Open Corros J 2:175–188CrossRefGoogle Scholar
  14. 14.
    Shirazi Z, Keshavarz MH, Esmaeilpour K, Golikand AN (2017) A simple approach for assessment of the corrosion inhibition efficiency of triazole, oxadiazole and thiadiazole derivatives as a function of their concentrations without using complex computer codes. Protect Met Phys Chem Surf 53:359–372CrossRefGoogle Scholar
  15. 15.
    Keshavarz MH, Esmaeilpour K, Golikand AN, Shirazi Z (2016) Simple approach to predict corrosion inhibition efficiency of imidazole and benzimidazole derivatives as well as linear organic compounds containing several polar functional groups. Z Anorg Allg Chem 642:906–913CrossRefGoogle Scholar
  16. 16.
    Keshavarz MH, Klapötke TM (2017) Energetic compounds: methods for prediction of their performance. Walter de Gruyter, BerlinCrossRefGoogle Scholar
  17. 17.
    Yoo SH, Kim YW, Chung K, Baik SY, Kim JS (2012) Synthesis and corrosion inhibition behavior of imidazoline derivates based on vegetable oil. Corros Sci 59:42–54CrossRefGoogle Scholar
  18. 18.
    Rani BEA, Basu BBJ (2012) Green inhibitors for corrosion protection of metals and alloys: an overview. Int J Corros 2:1–15CrossRefGoogle Scholar
  19. 19.
    Kliskic M, Radosevi J, Gudic S (1997) Pyridine and its derivatives as inhibitors of aluminium corrosion in chloride solution. J Appl Electrochem 27:947–952CrossRefGoogle Scholar
  20. 20.
    Scendo M, Hepel M (2008) Inhibiting properties of benzimidazole films for Cu(II)/Cu(I) reduction in chloride media studied by RDE and EqCN techniques. J Electroanal Chem 613:35–50CrossRefGoogle Scholar
  21. 21.
    Obot IN, Obi-Egbedi NO (2010) Theoretical study of benzimidazole and its derivatives and their potential activity as corrosion inhibitors. Corros Sci 52:657–660CrossRefGoogle Scholar
  22. 22.
    Benabdellah M, Tounsi A, Khaled K, Hammouti B (2011) Thermodynamic, chemical and electrochemical investigations of 2-mercapto benzimidazole as corrosion inhibitor for mild steel in hydrochloric acid solutions. Arab J Chem 4:17–24CrossRefGoogle Scholar
  23. 23.
    Samanta S, Das S, Biswas P (2013) Photocatalysis by 3,6-disubstituted-s-tetrazine: sisible-light driven metal-free green synthesis of 2-substitued benzimidazole and benzothiazole. J Org Chem 78:11184–11193PubMedCrossRefGoogle Scholar
  24. 24.
    Kovacevic K, Kokalj A (2011) Analysis of molecular electronic structure of imidazole and benzimidazole-based inhibitors: a simple recipe for qualitative estimation of chemical hardness. Corros Sci 53:909–921CrossRefGoogle Scholar
  25. 25.
    Sun SQ, Geng YF, Tian L, Chen SH, Yan YG, Hu SQ (2012) Density functional theory study of imidazole, benzimidazole and 2-mercaptobenzimidazole adsorption onto clean Cu(III) surface. Corros Sci 63:140–147CrossRefGoogle Scholar
  26. 26.
    Gutiérrez E, Rodríguez JA, Cruz-Borbolla J, Alvarado-Rodríguez JG, Thangarasu P (2016) Development of a predictive model for corrosion inhibition of carbon steel by imidazole and benzimidazole derivatives. Corros Sci 108:23–25CrossRefGoogle Scholar
  27. 27.
    Obot IB, Edouk UM (2017) Benzimidazole: small planar molecule with diverse anti-corrosion potentials. J Mol Liq 246:66–90CrossRefGoogle Scholar
  28. 28.
    Ashry ESH, Senior SA (2011) QSAR of lauric hydrazide and its salts as corrosion inhibitors by using the quantum chemical and topological descriptors. Corros Sci 53:1025–1034CrossRefGoogle Scholar
  29. 29.
    Khaled KF (2011) Modeling corrosion inhibition of iron in acid medium by genetic function approximation method: a QSAR model. Corros Sci 53:3457–3465CrossRefGoogle Scholar
  30. 30.
    Hu SQ et al (2011) 3D-QSAR study and molecular design of benzimidazole derivatives as corrosion inhibitor. Chem J Chin Univ 32:2402–2409Google Scholar
  31. 31.
    Camacho-Mendoza RL et al (2015) Density functional theory and electrochemical studies: structure–efficiency relationship on corrosion inhibition. J Chem Inf Model 55:2391–2402PubMedCrossRefGoogle Scholar
  32. 32.
    Li L et al (2015) The discussion of descriptors for the QSAR model and molecular dynamics simulation of benzimidazole derivatives as corrosion inhibitors. Corros Sci 99:76–88CrossRefGoogle Scholar
  33. 33.
    Shirazi Z, Keshavarz MH, Esmaeilpour K, Pakniya T (2017) A novel and simple method for the prediction of corrosion inhibition efficiency without using complex computer codes. Z Anorg Allg Chem 643:2149–2157CrossRefGoogle Scholar
  34. 34.
    Breimanr L (2001) Random forest. Mach Learn 45:5–32CrossRefGoogle Scholar
  35. 35.
    Aledo JC, Cantón FR, Veredas FJ (2017) A machine learning approach for predicting methionine oxidation sites. BMC Bioinform 18:430. CrossRefGoogle Scholar
  36. 36.
    Luo JS, Guo YZ, Zhong Y, Ma D, Li WL, Li ML (2014) A functional feature analysis on diverse protein-protein interactions: application for the prediction of binding affinity. J Comput Mol Des 28:619–629CrossRefGoogle Scholar
  37. 37.
    Luo JS, Li WL, Liu ZY, Guo YZ, Pu XM, Li ML (2015) A sequence-based two-level method for the prediction of type I secreted RTX proteins. Analyst 140:3048–3056PubMedCrossRefGoogle Scholar
  38. 38.
    Wang Y et al (2015) A comparative study of family-specific protein–ligand complex affinity prediction based on random forest approach. J Comput Mol Des 29:349–360CrossRefGoogle Scholar
  39. 39.
    Wang Y, Guo YZ, Pu XM, Li ML (2017) Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini. J Comput Mol Des 3:1029–1038CrossRefGoogle Scholar
  40. 40.
    Qiu H, Guo YZ, Yu LZ, Pu XM, Li ML (2018) Predicting protein lysine methylation sites by incorporating single-residue structural features into Chou’s pseudo components. Chemom Intell Lab Syst 179:31–38CrossRefGoogle Scholar
  41. 41.
    Hu W, Qin L, Li ML, Pu XM, Guo YZ (2018) Individually double minimum-distance definition of protein–RNA binding residues and application to structure-based prediction. J Comput Mol Des 32:1363–1373CrossRefGoogle Scholar
  42. 42.
    Altmann A, Toloşi L, Sander O, Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics 26:1340–1347PubMedCrossRefGoogle Scholar
  43. 43.
    Vapnik V (1998) Statistical learning theory. Wiley, New YorkGoogle Scholar
  44. 44.
    Ma D, Guo YZ, Luo JS, Pu XM, Li ML (2014) Prediction of protein–protein binding affinity using diverse protein–protein interface features. Chemom Intell Lab Syst 138:7–13CrossRefGoogle Scholar
  45. 45.
    Zhong Y, Guo YZ, Luo JS, Pu XM, Li ML (2014) Effective identification of kinase-specific phosphorylation sites based on domain–domain interactions. Chemom Intell Lab Syst 136:97–103CrossRefGoogle Scholar
  46. 46.
    Shi YN, Guo YZ, Hu YY, Li ML (2015) Position-specific prediction of methylation sites from sequence conservation based on information theory. Sci Rep 5:12403PubMedPubMedCentralCrossRefGoogle Scholar
  47. 47.
    Dai X et al (2015) Predicting the druggability of protein-protein interactions based on sequence and structure features of active pockets. Curr Pharm Des 21:3051–3061PubMedCrossRefGoogle Scholar
  48. 48.
    Hu YY, Guo YZ, Shi YN, Li ML, Pu XM (2015) A consensus subunit-specific model for annotation of substrate specificity for ABC transporters. RSC Adv 5:42009–42019CrossRefGoogle Scholar
  49. 49.
    Li WL, Guo YZ, Li ML, Pu XM (2017) Distinguishing the disease–associated SNPs based on composition frequency analysis. Interdiscip Sci 9:459–467PubMedCrossRefGoogle Scholar
  50. 50.
    Wang Y, Guo YZ, Pu XM, Li ML (2017) A sequence-based computational method for prediction of MoRFs. RSC Adv 7:18937–18945CrossRefGoogle Scholar

Copyright information

© International Association of Scientists in the Interdisciplinary Areas 2019

Authors and Affiliations

  1. 1.Research Institute of Natural Gas TechnologyPetro China Southwest Oil and Gas Field CompanyChengduChina
  2. 2.College of ChemistrySichuan UniversityChengduPeople’s Republic of China

Personalised recommendations