Robust Methods in Qsar

  • Beata WalczakEmail author
  • MichaŁ Daszykowski
  • Ivana Stanimirova
Part of the Challenges and Advances in Computational Chemistry and Physics book series (COCH, volume 8)


A large progress in the development of robust methods as an efficient tool for processing of data contaminated with outlying objects has been made over the last years. Outliers in the QSAR studies are usually the result of an improper calculation of some molecular descriptors and/or experimental error in determining the property to be modelled. They influence greatly any least square model, and therefore the conclusions about the biological activity of a potential component based on such a model are misleading. With the use of robust approaches, one can solve this problem building a robust model describing the data majority well. On the other hand, the proper identification of outliers may pinpoint a new direction of a drug development. The outliers’ assessment can exclusively be done with robust methods and these methods are to be described in this chapter


Outliers Robust PCA Robust PLS 


  1. 1.
    Martens H, Næs T (1989) Multivariate calibration. John Wiley & Sons, ChichesterGoogle Scholar
  2. 2.
    Næs T, Isaksson T, Fearn T, Davies T (2002) Multivariate calibration and classification. NIR Publications, ChichesterGoogle Scholar
  3. 3.
    Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. John Wiley & Sons, New YorkCrossRefGoogle Scholar
  4. 4.
    Rousseeuw PJ, Debruyne M, Engelen S et al. (2006) Robustness and outlier detection in chemometrics. Crit Rev Anal Chem 36:221–242CrossRefGoogle Scholar
  5. 5.
    Walczak B, Massart DL (1998) Multiple outlier detection revisited. Chemom Intell Lab Syst 41:1–15CrossRefGoogle Scholar
  6. 6.
    Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley, New YorkCrossRefGoogle Scholar
  7. 7.
    Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110:5959–5967CrossRefGoogle Scholar
  8. 8.
    Daszykowski M, Walczak B, Xu QS et al. (2004) Classification and regression trees – Studies of HIV reverse transcriptase inhibitors. J Chem Inf Comput Sci 44:716–726Google Scholar
  9. 9.
    Daeyaert F, de Jonge M, Heeres J et al. (2004) A pharmacophore docking algorithm and its application to the cross-docking of 18 HIV-NNTI’s in their binding pockets. Protein Struct Funct Genet 54:526–533CrossRefGoogle Scholar
  10. 10.
    Wehrens R, de Gelder R, Kemperman GJ et al. (1999) Molecular challenges in modern chemometrics. Anal Chim Acta 400:413–424CrossRefGoogle Scholar
  11. 11.
    Kim KH (2007) Outliers in SAR and QSAR: 2. Is a flexible binding site a possible source of outliers. J Comput Aided Mol Design 21:421–435CrossRefGoogle Scholar
  12. 12.
    Lipnick RL (1991) Outliers: their origin and use in the classification of molecular mechanisms of toxicity. Sci Tot Environ 109/110:131–153CrossRefGoogle Scholar
  13. 13.
    Kim KW (2007) Outliers in SAR and QSAR: Is unusual binding mode a possible source of outliers. J Comput Aided Mol Design 21:63–86CrossRefGoogle Scholar
  14. 14.
    Hampel FR (1971) A general definition of qualitative robustness. Ann Mat Stat 42:1887–1896CrossRefGoogle Scholar
  15. 15.
    Hampel FR (1974) The influence curve and its role in robust estimation. Annal Stat 69:383–393Google Scholar
  16. 16.
    Huber PJ (1981) Robust statistics. John Wiley & Sons, New YorkCrossRefGoogle Scholar
  17. 17.
    Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics. John Wiley & Sons, ChichesterCrossRefGoogle Scholar
  18. 18.
    Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: The projection-pursuit approach revisited. J Mul Anal 95:206–226CrossRefGoogle Scholar
  19. 19.
    Rousseeuw PJ, Croux C (1993) Alternatives to Median Absolute Deviation. J Am Stat Assoc 88:1273–1283CrossRefGoogle Scholar
  20. 20.
    Stahel WA (1981) Robust estimation: infinitesimal optimality and covariance matrix estimators. PhD Thesis, ETH, ZürichGoogle Scholar
  21. 21.
    Donoho DL (1982) Breakdown properties of multivariate location estimators. PhD Qualifying paper, Harvard UniversityGoogle Scholar
  22. 22.
    Friedman JH, Tukey JW (1974) A projection pursuit for exploratory data analysis. IEEE Trans Comput 23:881–889CrossRefGoogle Scholar
  23. 23.
    Maronna RA, Yohai VJ (1995) The behaviors of the Stahel-Donoho robust multivariate estimator. J Am Stat Assoc 90:330–341CrossRefGoogle Scholar
  24. 24.
    Croux C, Haesbroeck G (1999) Influence function and efficiency of the minimum covariance determinant of scatter matrix estimator. J Mul Anal 71:161–190CrossRefGoogle Scholar
  25. 25.
    Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for minimum covariance determinant estimator. Technometrics 41:212–223CrossRefGoogle Scholar
  26. 26.
    Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124CrossRefGoogle Scholar
  27. 27.
    Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. In: Grossmann W, Pflug G, Vinche I (eds) Mathematical statistics and applications, Vol. B. Reidel, DordrechtGoogle Scholar
  28. 28.
    Woodruff DL, Rocke DM (1993) Heuristic search algorithms for the minimum volume ellipsoid. J Comput Graph Stat 2:69–95CrossRefGoogle Scholar
  29. 29.
    Cook RD, Hawkins DM, Weisberg S (1992) Exact iterative computations of the robust multivariate minimum volume ellipsoid estimator. Stat Prob Lett 16:213–218CrossRefGoogle Scholar
  30. 30.
    Agulló J (1996) Exact iterative computation of the multivariate minimum volume ellipsoid estimator with a branch and bound algorithm. In: Prat A (ed) Computational statistics. Physica-Verlag, HeidelbergGoogle Scholar
  31. 31.
    Hubert M (2006) Robust calibration. In: Gemperline P (ed) Practical guide to chemometrics. Taylor & Francis, LondonGoogle Scholar
  32. 32.
  33. 33.
    Rousseeuw PJ, Yohai VJ (1984) Robust regression by means of S-estimators. In: Franke J, Härdle W, Martin D (eds) Robust and nonlinear time series. Lecture notes in statistics, vol 26. Springer, New York, pp 256–272Google Scholar
  34. 34.
    Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Annal Stat 15:642–656CrossRefGoogle Scholar
  35. 35.
    Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2:37–52CrossRefGoogle Scholar
  36. 36.
    Malinowski ER (1991) Factor analysis in chemistry. John Wiley & Sons, New YorkGoogle Scholar
  37. 37.
    Stanimirova I, Walczak B, Massart DL et al. (2004) A comparison between two robust PCA algorithms. Chemom Intell Lab Syst 71:83–95CrossRefGoogle Scholar
  38. 38.
    Engelen S, Hubert M, Vanden Branden K (2005) A comparison of three procedures for robust PCA in high dimensions. Austrian J Stat 34:117–126Google Scholar
  39. 39.
    Locantore N, Marron JS, Simpson DG et al. (1999) Robust principal component analysis for functional data. Test 8:1–73CrossRefGoogle Scholar
  40. 40.
    Verboven S, Hubert M (2005) LIBRA: a MATLAB library for robust analysis. Chemom Intell Lab Syst 75:127–136CrossRefGoogle Scholar
  41. 41. Accessed on the 16th of August 2009
  42. 42.
    Ruymgaart FH (1981) A robust principal analysis. J Mul Anal 11:485–497CrossRefGoogle Scholar
  43. 43.
    Li G, Chen ZL (1985) Projection-pursuit approach to robust dispersion matrices and principal components: Primary theory and Monte Carlo. J Am Stat Assoc 80:759–766CrossRefGoogle Scholar
  44. 44.
    Amman LP (1993) Robust singular value decompositions: A new approach to projection pursuit. J Am Stat Assoc 88:505–514CrossRefGoogle Scholar
  45. 45.
    Galpin JS, Hawkins DM (1987) Methods of L1 estimation of a covariance matrix. Comput Stat Data Anal 5:305–319CrossRefGoogle Scholar
  46. 46.
    Xie YL, Wang JH, Liang YZ et al. (1993) Robust principal component analysis by projection pursuit. J Chemometr 7:527–541CrossRefGoogle Scholar
  47. 47.
    Croux C, Ruiz-Gazen A (1996) A fast algorithm for robust principal components based on projection pursuit. In: Prat A (ed) Compstat: Proceedings in computational statistics. Physica-Verlag, HeidelbergGoogle Scholar
  48. 48.
    Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: A new approach to robust principal component analysis. Technometrics 47:64–79CrossRefGoogle Scholar
  49. 49. Accessed on the 16th of August 2009
  50. 50.
    de Jong S (1993) SIMPLS: An alternative approach to partial least squares. Chemom Intell Lab Syst 42:251–263CrossRefGoogle Scholar
  51. 51.
    Wakeling IN, Macfie HJH (1992) A robust PLS procedure. J Chemometr 6:189–198CrossRefGoogle Scholar
  52. 52.
    Gil JA, Romera R (1998) On robust partial least squares (PLS) methods. J Chemometr 12:365–378CrossRefGoogle Scholar
  53. 53.
    Cummins DJ, Andrews CW (1995) Iteratively reweighted partial least squares: a performance analysis by Monte Carlo simulation. J Chemometr 9:489–507CrossRefGoogle Scholar
  54. 54.
    Serneels S, Croux C, Filzmoser P et al. (2005) Partial Robust M-regression. Chemom Intell Lab Syst 79:55–64CrossRefGoogle Scholar
  55. 55.
    Daszykowski M, Serneels S, Kaczmarek K et al. (2007) TOMCAT: A MATLAB toolbox for multivariate calibration techniques. Chemom Intell Lab Syst 85:269–277CrossRefGoogle Scholar
  56. 56.
    Serneels S, De Nolf E, Van Espen PJ (2006) Spatial sign preprocessing: a simple way to impart moderate robustness to multivariate estimators. J Chem Inf Model 3:1402–1409CrossRefGoogle Scholar
  57. 57.
    Hubert M, Vanden Branden K (2003) Robust methods for partial least squares regression. J Chemometr 17:537–549CrossRefGoogle Scholar
  58. 58.
    Verhaar HJM, Ramos EU, Hermens JLM (1996) Classifying environmental pollutants. 2: separation of class 1 (baseline toxicity) and class 2 (‘polar narcosis’) type compounds based on chemical descriptors. J Chemometr 10:149–162CrossRefGoogle Scholar
  59. 59. Accessed on the 16th of August 2009
  60. 60.
  61. 61.
    Hubert M, Engelen S (2004) Fast cross-validation of high-breakdown resampling methods for PCA. In: Antoch J (ed) Proceedings in computational statistics. Springer-Verlag, HeidelbergGoogle Scholar
  62. 62.
    Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11:137–148CrossRefGoogle Scholar
  63. 63.
    Liang YZ, Kvalheim OM (1996) Robust methods for multivariate analysis—a tutorial review. Chemom Intell Lab Syst 32:1–10CrossRefGoogle Scholar
  64. 64.
    Møller SF, von Frese J, Bro R (2005) Robust methods for multivariate data analysis. J Chemometr 19:549–563CrossRefGoogle Scholar
  65. 65.
    Daszykowski M, Kaczmarek K, Vander Heyden Y et al. (2007) Robust statistics in data analysis – a review. Basic concepts. Chemom Intell Lab Syst 85:203–219CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Beata Walczak
    • 1
    Email author
  • MichaŁ Daszykowski
    • 1
  • Ivana Stanimirova
    • 1
  1. 1.Department of ChemometricsInstitute of Chemistry, The University of SilesiaKatowicePoland

Personalised recommendations