Advertisement

Monte Carlo method for identification of outlier molecules in QSAR studies

  • Tarko Laszlo
Original Paper

Abstract

The paper presents some difficulties that appear in the application of the classical formula in the identification of “outliers” in a given objects set. The paper proposes a new Monte Carlo-like method for the identification of “outliers” in the calibration set used in QSPR/QSAR computations. Sub-sets of molecules are randomly extracted thousands of times from the given calibration set. The method relies on the idea that the presence of “outlier” molecules in a certain sub-set decreases the prediction power of the QSAR equation that used this particular sub-set of molecules. The presence of “outlier” molecules often leads to poor quality QSAR equations and rarely to high quality QSAR equations. The paper proposes a specific formula for “outlier index”. The molecule with the highest value of the outlier index is eliminated out of the calibration set. The identification/elimination process is repeated until the maximum value of the outlier index stops decreasing. The paper presents five examples of outliers’ identification using various kinds of calibration sets. We compare the results with the results obtained by a classical outlier index formula, using the same calibration set, the same set of descriptors and the same outlier identification/elimination procedure.

Keywords

Monte Carlo Outliers Qsar 

References

  1. 1.
  2. 2.
    Barnett V., Roberts D.: Communications in statistics. Theory Methods 22, 2703 (1993)CrossRefGoogle Scholar
  3. 3.
    Carling K.: Comput. Stat. & Data Anal. 33, 249 (2000)CrossRefGoogle Scholar
  4. 4.
    Kremer M.B., Martin R.D.: Comput. Intell. Finan. Eng. (CIFEr) 29, 212 (1998)Google Scholar
  5. 5.
    Zhou Q., Li S., Li X., Wang W., Wang Z.: Clin. Chim. Acta 372, 94 (2006)CrossRefGoogle Scholar
  6. 6.
    M.M. Breunig, H. Kriegel, R.T. Ng, J. Sander, Proceedings of the ACM SIGMOD conference, (Dallas, 2000), p. 93Google Scholar
  7. 7.
    M. Ester, H. Kriegel, J. Sander, X. Xu, Proceedings of the 2nd international conference on knowledge discovery and data mining, (1996), p. 226Google Scholar
  8. 8.
    Steele A.G., Wood B.M., Douglas R.J.: Metrologia 42, 32 (2005)CrossRefGoogle Scholar
  9. 9.
    E.M. Knorr, R.T. Ng, Proceedings of the 24th international conference on very large data bases, (New York, 1998), p. 392Google Scholar
  10. 10.
    Šaltenis V.: Informatica 15, 399 (2004)Google Scholar
  11. 11.
    Moorhead C.R.: J. Royal Stat. Soc. (B) 48, 39 (1986)Google Scholar
  12. 12.
    Fox A.J.: J. Royal Stat. Soc. (B) 34, 350 (1972)Google Scholar
  13. 13.
    Verboon P., van der Lans I.A.: Psychometrika 59, 485 (1994)CrossRefGoogle Scholar
  14. 14.
    Sutradhar B.C.: Ind. J. Stat. 57, 299 (1995)Google Scholar
  15. 15.
    Ruiz I.L., Cuadrado M.U., Gomez-Nieto M.A.: Proc. World Acad. Sci. Eng. Technol. 22, 302 (2007)Google Scholar
  16. 16.
    Kodithala K., Hopfinger A.J., Thompson E.D., Robinson M.K.: Toxicol. Sci. 66, 336 (2002)CrossRefGoogle Scholar
  17. 17.
    Motulsky H.J., Brown R.E.: BMC Bioinformatics 7, 123 (2006)CrossRefGoogle Scholar
  18. 18.
    Solberg H.E., Lahti A.: Clin. Chem. 51, 2326 (2005)CrossRefGoogle Scholar
  19. 19.
    Hristea F.: Math. Rep. 54, 177 (2002)Google Scholar
  20. 20.
    Verma R.P., Hansch C.: Bioorg. & Med. Chem. 13, 4597 (2005)CrossRefGoogle Scholar
  21. 21.
    Tarko L.: Rev. Chim. (Bucuresti) 59, 185 (2008)Google Scholar
  22. 22.
    Schmid G.H., Csizmadia V.M., Mezey P.G., Csizmadia I.G.: Can. J. Chem. 54, 3330 (1976)CrossRefGoogle Scholar
  23. 23.
    Maggiora G.: J. Chem. Inf. Model. 46, 1535 (2006)CrossRefGoogle Scholar
  24. 24.
    Kim K.H.: J. Comp-Aid. Mol. Des. 21, 63 (2007)CrossRefGoogle Scholar
  25. 25.
    Martins R.C.A., Magaly G.A., Alencastro R.B.: J. Braz. Chem. Soc. 13, 816 (2002)CrossRefGoogle Scholar
  26. 26.
    Cramer R.D., Jilek R.J., Guessregen S., Clark S.J., Wendt B., Clark R.D.: J. Med. Chem. 47, 6777 (2004)CrossRefGoogle Scholar
  27. 27.
    Saeh J.C., Lyne P.D., Takasaki B.K., Cosgrove D.A.: J. Chem. Inf. Comput. Sci. 45, 1122 (2005)Google Scholar
  28. 28.
    Furusjo E., Svenson A, Rahmberg M., Andersson M.: Chemosphere 63, 99 (2006)CrossRefGoogle Scholar
  29. 29.
    Cronin M.T.D., Schultz W.: J. Mol. Struct. Theochem 622, 39 (2003)CrossRefGoogle Scholar
  30. 30.
    Konovalov D.A., Sim N., Deconinck E., Heyden Y.V., Coomans D.: J. Chem. Inf. Model. 48, 370 (2008)CrossRefGoogle Scholar
  31. 31.
    Dalin Y., Yizeng L., Qingsong X.: Comp. Appl. Chem. 23, 569 (2006)Google Scholar
  32. 32.
    Konovalov D.A., Llewellyn L.E., Heyden Y.V., Coomans D.: J. Chem. Inf. Model. 48, 2081 (2008)CrossRefGoogle Scholar
  33. 33.
    PCModel v. 9.1 is available from Serena Software, Box 3076, Bloomington, IN, 47402-3076, USA, see Internet site http://www.serenasoft.com/
  34. 34.
    Stewart J.J.P.: J. Mol. Model. 13, 1173 (2007)CrossRefGoogle Scholar
  35. 35.
    Last version of MOPAC is available from Internet site http://www.openmopac.net/
  36. 36.
    Tarko L.: Rev. Chim. (Bucuresti) 56, 639 (2005)Google Scholar
  37. 37.
    L. Tarko, I. Lupescu, D. Groposila-Constantinescu, Arkivoc X, 254 (2005)Google Scholar
  38. 38.
    Tarko L., Supuran C.T.: Bioorg. & Med. Chem. 15, 5666 (2007)CrossRefGoogle Scholar
  39. 39.
    PRECLAV software is available from Center of Organic Chemistry (CCO), Bucharest – Romanian Academy; managing director pfilip@cco.ro; author ltarko@cco.roGoogle Scholar
  40. 40.
    Hall L.H., Vaughn T.A.: Med. Chem. Res. 7, 407 (1997)Google Scholar
  41. 41.
    Roy K., Gosh G.: Int. Electr. J. Mol. Design 2, 599 (2003)Google Scholar
  42. 42.
    Tronchet J.M.J., Grigorov M., Dolatshahi N., Moriaud F., Weber J.: Eur. J. Med. Chem. 32, 279 (1997)CrossRefGoogle Scholar
  43. 43.
    Ursu O., Costescu A., Diudea M.V., Parv B.: Croat. Chim. Acta. 79, 483 (2006)Google Scholar
  44. 44.
  45. 45.
  46. 46.
    Meylan W.M., Howard P.H.: J. Pharm. Sci. 84, 83 (1995)CrossRefGoogle Scholar
  47. 47.
    Estimation Programs Interface (EPI) software is available from Internet site http://www.epa.gov/oppt/exposure/pubs/episuitedl.htm

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Center of Organic Chemistry “C. D. Nenitzescu”–Romanian AcademyBucharestRomania

Personalised recommendations