Advertisement

Robust and Sparse Estimation of the Inverse Covariance Matrix Using Rank Correlation Measures

  • Christophe Croux
  • Viktoria Öllerer
Conference paper

Abstract

Spearman’s rank correlation is a robust alternative for the standard correlation coefficient. Using ranks instead of the actual values of the observations, the impact of outliers remains limited. In this paper, we study an estimator based on this rank correlation measure for estimating covariance matrices and their inverses. The resulting estimator is robust and consistent at the normal distribution. By applying the graphical lasso, the inverse covariance matrix estimator is positive definite if more variables than observations are available in the data set. Moreover, it will contain many zeros, and is therefore said to be sparse. Instead of Spearman’s rank correlation, one can use Kendall correlation, Quadrant correlation or Gaussian rank scores. A simulation study compares the different estimators. This type of estimator is particularly useful for estimating (inverse) covariance matrices in high dimensions, when the data may contain several outliers in many cells of the data matrix. More traditional robust estimators are not well defined or computable in this setting. An important feature of the proposed estimators is their simplicity and easiness to compute using existing software.

Keywords

Positive Semidefinite Robust Estimator Sample Covariance Matrix Breakdown Point Precision Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

The authors wish to acknowledge the support from the GOA/12/014 project of the Research Fund KU Leuven. We also would like to thank the referees for their constructive comments that improved the paper considerably.

References

  1. Abbruzzo A, Vujacic I, Wit E, Mineo A (2014) Generalized information criterion for model selection in penalized graphical models. arXiv:1403.1249
  2. Agostinelli C, Leung A, Yohai V, Zamar R (2015) Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. Test 24(3):441–461Google Scholar
  3. Alqallaf F, Konis K, Martin R, Zamar R (2002) Scalable robust covariance and correlation estimates for data mining. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 14–23Google Scholar
  4. Alqallaf F, Van Aelst S, Yohai V, Zamar R (2009) Propagation of outliers in multivariate data. Ann Stat 37(1):311–331MathSciNetCrossRefMATHGoogle Scholar
  5. Bilodeau M (2014) Graphical lassos for meta-elliptical distributions. Can J Stat 42:185–203MathSciNetCrossRefMATHGoogle Scholar
  6. Boudt K, Cornelissen J, Croux C (2012) The Gaussian rank correlation estimator: robustness properties. Stat Comput 22(2):471–483MathSciNetCrossRefMATHGoogle Scholar
  7. Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data. Springer, HeidelbergCrossRefMATHGoogle Scholar
  8. Croux C, Dehon C (2010) Influence functions of the Spearman and Kendall correlation measures. Stat Meth Appl 19(4):497–515MathSciNetCrossRefMATHGoogle Scholar
  9. Dürre A, Vogel D, Fried R (2015) Spatial sign correlation. J Multivar Anal 135:89–105MathSciNetCrossRefMATHGoogle Scholar
  10. Finegold M, Drton M (2011) Robust graphical modeling of gene networks using classical and alternative \(t\)-distributions. Ann Appl Stat 5(2A):1057–1080MathSciNetCrossRefMATHGoogle Scholar
  11. Foygel R, Drton M (2010) Extended bayesian information criteria for gaussian graphical models. In: Advances in neural information processing systems 23, Curran Associates, Inc., pp 604–612Google Scholar
  12. Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441CrossRefMATHGoogle Scholar
  13. Gnanadesikan R, Kettenring J (1972) Robust estimates, residuals and outlier detection with multiresponse data. Biometrics 28(1):81–124CrossRefGoogle Scholar
  14. Higham N (2002) Computing the nearest correlation matrix - a problem from finance. IMA J Numer Anal 22(3):329–343MathSciNetCrossRefMATHGoogle Scholar
  15. Kalisch M, Bühlmann P (2008) Robustification of the pc-algorithm for directed acyclic graphs. J Comput Graph Stat 17(4):773–789MathSciNetCrossRefGoogle Scholar
  16. Kendall M (1938) A new measure of rank correlation. Biometrika 30:81–93MathSciNetCrossRefMATHGoogle Scholar
  17. Liu H, Lafferty J, Wasserman L (2009) The nonparanormal: semiparametric estimation on high dimensional undirected graphs. J Mach Learn Res 10:2295–2328MathSciNetMATHGoogle Scholar
  18. Liu H, Roeder K, Wasserman L (2010) Stability approach to regularization selection (StARS) for high dimensional graphical models. In: Advances in neural information processing systems 23, Curran Associates, Inc., pp 1432–1440Google Scholar
  19. Liu H, Han F, Yuan M, Lafferty J, Wasserman L (2012a) High-dimensional semiparametric Gaussian copula graphical models. Ann Stat 40(4):2293–2326MathSciNetCrossRefMATHGoogle Scholar
  20. Liu H, Han F, Zhang C (2012b) Transelliptical graphical models. In: Advances in neural information processing systems 25, Curran Associates, Inc., pp 800–808Google Scholar
  21. Maronna R, Martin R, Yohai V (2006) Robust statistics, 2nd edn. Wiley, HobokenCrossRefMATHGoogle Scholar
  22. Öllerer V, Croux C (2015) Robust high-dimensional precision matrix estimation. In: Nordhausen K, Taskinen S (eds) Modern Nonparametric, Robust and Multivariate Methods, Springer, pp 325–350Google Scholar
  23. Ollila E, Tyler D (2014) Regularized M-estimators of scatter matrix. IEEE Trans Signal Process 62(22):6059–6070MathSciNetCrossRefGoogle Scholar
  24. Rousseeuw P, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88(424):1273–1283MathSciNetCrossRefMATHGoogle Scholar
  25. Rousseeuw P, Molenberghs G (1993) Transformation of nonpositive semidefinite correlation matrices. Commun Stat - Theory Meth 22(4):965–984CrossRefMATHGoogle Scholar
  26. Rousseeuw P, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223CrossRefGoogle Scholar
  27. Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Maechler M (2015) Robustbase: basic robust statistics. http://CRAN.R-project.org/package=robustbase, r package version 0.92-3
  28. Seber G (2008) A matrix handbook for Statisticians. Wiley, HobokenMATHGoogle Scholar
  29. Tarr G, Müller S, Weber N (2016) Robust estimation of precision matrices under cellwise contamination. Comput Stat Data Anal 93:404–420Google Scholar
  30. Todorov V, Filzmoser P, Fritz H, Kalcher K (2014) pcaPP: Robust PCA by Projection Pursuit. http://CRAN.R-project.org/package=pcaPP, r package version 1.9-60
  31. Tyler D (2010) A note on multivariate location and scatter statistics for sparse data. Stat Probab Lett 80(17–18):1409–1413MathSciNetCrossRefMATHGoogle Scholar
  32. Van Aelst S, Vandervieren E, Willems G (2010) Robust principal component analysis based on pairwise correlation estimators. In: Proceedings of COMPSTAT2010, Physica-Verlag HD, pp 573–580Google Scholar
  33. Van Aelst S, Vandervieren E, Willems G (2011) Stahel-Donoho estimators with cellwise weights. J Stat Comput Simul 81(1):1–27MathSciNetCrossRefMATHGoogle Scholar
  34. Vogel D, Fried R (2011) Elliptical graphical modelling. Biometrika 98(4):935–951MathSciNetCrossRefMATHGoogle Scholar
  35. Xue L, Zou H (2012) Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann Stat 40(5):2541–2571MathSciNetCrossRefMATHGoogle Scholar
  36. Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94(1):19–35MathSciNetCrossRefMATHGoogle Scholar
  37. Zhao T, Liu H, Roeder K, Lafferty J, Wasserman L (2012) The huge package for high-dimensional undirected graph estimation in \({\sf {R}}\). J Mach Learn Res 13:1059–1062Google Scholar
  38. Zhao T, Liu H, Roeder K, Lafferty J, Wasserman L (2014a) huge: High-dimensional undirected graph estimation. URL http://CRAN.R-project.org/package=huge, r package version 1.2.6
  39. Zhao T, Roeder K, Liu H (2014b) Positive semidefinite rank-based correlation matrix estimation with application to semiparametric graph estimation. J Comput Graph Stat 23(4):895–922MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.ORSTAT, Faculty of Economics and BusinessKU LeuvenLeuvenBelgium

Personalised recommendations