Skip to main content

On Coupling Robust Estimation with Regularization for High-Dimensional Data

  • Conference paper
  • First Online:
Data Science

Abstract

Standard data mining procedures are sensitive to the presence of outlying measurements in the data. Therefore, robust data mining procedures are highly desirable, which are resistant to outliers. This work has the aim to propose new robust classification procedures for high-dimensional data and algorithms for their efficient computation. Particularly, we use the idea of implicit weights assigned to individual observation to propose several robust regularized versions of linear discriminant analysis (LDA), suitable for data with the number of variables exceeding the number of observations. The approach is based on a regularized version of the minimum weighted covariance determinant (MWCD) estimator and represents a unique attempt to combine regularization and high robustness, allowing to down-weight outlying observations. Classification performance of new methods is illustrated on real fMRI data acquired in neuroscience research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, Y., Wiesel, A., Hero, A.O.: Robust shrinkage estimation of high dimensional covariance matrices. IEEE Trans. Signal Process. 59, 4097–4107 (2011)

    Article  MathSciNet  Google Scholar 

  2. Croux, C., Öllerer, V.: Robust and sparse estimation of the inverse covariance matrix using rank correlation measures. Technical Report, KU Leuven (2015)

    MATH  Google Scholar 

  3. Filzmoser, P., Todorov, V.: Review of robust multivariate statistical methods in high dimension. Anal. Chim. Acta 705, 2–14 (2011)

    Article  Google Scholar 

  4. Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 8, 86–100 (2007)

    Article  MATH  Google Scholar 

  5. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)

    Book  MATH  Google Scholar 

  6. Herlands, W., De-Arteaga, M., Neill, D., Dubrawski, A.: Lass-0: sparse non-convex regression by local search (2016, submitted)

    Google Scholar 

  7. Hoffmann, I., Serneels, S., Filzmoser, P., Croux, C.: Sparse partial robust M regression. Chemom. Intel. Lab. Syst. 149, 50–59 (2015)

    Article  Google Scholar 

  8. Hubert, M., Rousseeuw, P.J., Van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23, 92–119 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Jurečková, J., Sen, P.K., Picek, J.: Methodology in Robust and Nonparametric Statistics. CRC Press, Boca Raton (2012)

    MATH  Google Scholar 

  10. Kalina, J.: Highly robust statistical methods in medical image analysis. Biocybern. Biomed. Eng. 32(2), 3–16 (2012)

    Article  Google Scholar 

  11. Kalina, J.: Classification analysis methods for high-dimensional genetic data. Biocybern. Biomed. Eng. 34, 10–18 (2014)

    Article  Google Scholar 

  12. Kalina, J., Zvárová J.: Decision support systems in the process of improving patient safety. In: Bioinformatics: Concepts, Methodologies, Tools, and Applications, pp. 1113–1125. IGI Global, Hershey (2013)

    Google Scholar 

  13. Pourahmadi, M.: High-Dimensional Covariance Estimation. Wiley, Hoboken (2013)

    Book  MATH  Google Scholar 

  14. Roelant, E., Van Aelst, S., Willems, G.: The minimum weighted covariance determinant estimator. Metrika 70, 177–204 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  15. Tibshirani, R., Hastie, T., Narasimhan, B.: Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci. 18, 104–117 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Tyler, D.E.: A distribution-free M-estimator of multivariate scatter. Ann. Stat. 15, 234–251 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  17. Víšek, J.Á.: Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47, 179–206 (2011)

    MathSciNet  MATH  Google Scholar 

  18. Wilms, I., Croux, C.: Robust sparse canonical correlation analysis. BMC Systems Biology 10, 72 (2016)

    Article  Google Scholar 

  19. Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B.: Robust Data Mining. Springer, New York (2013)

    Book  MATH  Google Scholar 

Download references

Acknowledgements

The work is supported by the project “National Institute of Mental Health (NIMH-CZ)”, grant number CZ.1.05/2.1.00/03.0078 of the European Regional Development Fund, Neuron Fund for Support of Science, and the Czech Science Foundation project No. 13-23940S.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Kalina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kalina, J., Hlinka, J. (2017). On Coupling Robust Estimation with Regularization for High-Dimensional Data. In: Palumbo, F., Montanari, A., Vichi, M. (eds) Data Science . Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-55723-6_2

Download citation

Publish with us

Policies and ethics