On Coupling Robust Estimation with Regularization for High-Dimensional Data

Kalina, Jan; Hlinka, Jaroslav

doi:10.1007/978-3-319-55723-6_2

Jan Kalina^21,22 &
Jaroslav Hlinka^21,22

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3469 Accesses
1 Citations

Abstract

Standard data mining procedures are sensitive to the presence of outlying measurements in the data. Therefore, robust data mining procedures are highly desirable, which are resistant to outliers. This work has the aim to propose new robust classification procedures for high-dimensional data and algorithms for their efficient computation. Particularly, we use the idea of implicit weights assigned to individual observation to propose several robust regularized versions of linear discriminant analysis (LDA), suitable for data with the number of variables exceeding the number of observations. The approach is based on a regularized version of the minimum weighted covariance determinant (MWCD) estimator and represents a unique attempt to combine regularization and high robustness, allowing to down-weight outlying observations. Classification performance of new methods is illustrated on real fMRI data acquired in neuroscience research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, Y., Wiesel, A., Hero, A.O.: Robust shrinkage estimation of high dimensional covariance matrices. IEEE Trans. Signal Process. 59, 4097–4107 (2011)
Article MathSciNet Google Scholar
Croux, C., Öllerer, V.: Robust and sparse estimation of the inverse covariance matrix using rank correlation measures. Technical Report, KU Leuven (2015)
MATH Google Scholar
Filzmoser, P., Todorov, V.: Review of robust multivariate statistical methods in high dimension. Anal. Chim. Acta 705, 2–14 (2011)
Article Google Scholar
Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 8, 86–100 (2007)
Article MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)
Book MATH Google Scholar
Herlands, W., De-Arteaga, M., Neill, D., Dubrawski, A.: Lass-0: sparse non-convex regression by local search (2016, submitted)
Google Scholar
Hoffmann, I., Serneels, S., Filzmoser, P., Croux, C.: Sparse partial robust M regression. Chemom. Intel. Lab. Syst. 149, 50–59 (2015)
Article Google Scholar
Hubert, M., Rousseeuw, P.J., Van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23, 92–119 (2008)
Article MathSciNet MATH Google Scholar
Jurečková, J., Sen, P.K., Picek, J.: Methodology in Robust and Nonparametric Statistics. CRC Press, Boca Raton (2012)
MATH Google Scholar
Kalina, J.: Highly robust statistical methods in medical image analysis. Biocybern. Biomed. Eng. 32(2), 3–16 (2012)
Article Google Scholar
Kalina, J.: Classification analysis methods for high-dimensional genetic data. Biocybern. Biomed. Eng. 34, 10–18 (2014)
Article Google Scholar
Kalina, J., Zvárová J.: Decision support systems in the process of improving patient safety. In: Bioinformatics: Concepts, Methodologies, Tools, and Applications, pp. 1113–1125. IGI Global, Hershey (2013)
Google Scholar
Pourahmadi, M.: High-Dimensional Covariance Estimation. Wiley, Hoboken (2013)
Book MATH Google Scholar
Roelant, E., Van Aelst, S., Willems, G.: The minimum weighted covariance determinant estimator. Metrika 70, 177–204 (2009)
Article MathSciNet MATH Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B.: Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci. 18, 104–117 (2003)
Article MathSciNet MATH Google Scholar
Tyler, D.E.: A distribution-free M-estimator of multivariate scatter. Ann. Stat. 15, 234–251 (1987)
Article MathSciNet MATH Google Scholar
Víšek, J.Á.: Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47, 179–206 (2011)
MathSciNet MATH Google Scholar
Wilms, I., Croux, C.: Robust sparse canonical correlation analysis. BMC Systems Biology 10, 72 (2016)
Article Google Scholar
Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B.: Robust Data Mining. Springer, New York (2013)
Book MATH Google Scholar

Download references

Acknowledgements

The work is supported by the project “National Institute of Mental Health (NIMH-CZ)”, grant number CZ.1.05/2.1.00/03.0078 of the European Regional Development Fund, Neuron Fund for Support of Science, and the Czech Science Foundation project No. 13-23940S.

Author information

Authors and Affiliations

Institute of Computer Science of the Czech Academy of Sciences, Pod Vodárenskou věží 2, 182 07, Prague, Czech Republic
Jan Kalina & Jaroslav Hlinka
National Institute of Mental Health, Topolová 748, 250 67, Klecany, Czech Republic
Jan Kalina & Jaroslav Hlinka

Authors

Jan Kalina
View author publications
You can also search for this author in PubMed Google Scholar
Jaroslav Hlinka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Kalina .

Editor information

Editors and Affiliations

Department of Political Sciences, University of Naples Federico II, Napoli, Italy
Francesco Palumbo
Department of Statistical Sciences Paolo Fortunati, Alma Mater Studiorum, University of Bologna, Bologna, Italy
Angela Montanari
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kalina, J., Hlinka, J. (2017). On Coupling Robust Estimation with Regularization for High-Dimensional Data. In: Palumbo, F., Montanari, A., Vichi, M. (eds) Data Science . Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-55723-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-55723-6_2
Published: 05 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55722-9
Online ISBN: 978-3-319-55723-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics