Abstract
Standard data mining procedures are sensitive to the presence of outlying measurements in the data. Therefore, robust data mining procedures are highly desirable, which are resistant to outliers. This work has the aim to propose new robust classification procedures for high-dimensional data and algorithms for their efficient computation. Particularly, we use the idea of implicit weights assigned to individual observation to propose several robust regularized versions of linear discriminant analysis (LDA), suitable for data with the number of variables exceeding the number of observations. The approach is based on a regularized version of the minimum weighted covariance determinant (MWCD) estimator and represents a unique attempt to combine regularization and high robustness, allowing to down-weight outlying observations. Classification performance of new methods is illustrated on real fMRI data acquired in neuroscience research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, Y., Wiesel, A., Hero, A.O.: Robust shrinkage estimation of high dimensional covariance matrices. IEEE Trans. Signal Process. 59, 4097–4107 (2011)
Croux, C., Öllerer, V.: Robust and sparse estimation of the inverse covariance matrix using rank correlation measures. Technical Report, KU Leuven (2015)
Filzmoser, P., Todorov, V.: Review of robust multivariate statistical methods in high dimension. Anal. Chim. Acta 705, 2–14 (2011)
Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Biostatistics 8, 86–100 (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)
Herlands, W., De-Arteaga, M., Neill, D., Dubrawski, A.: Lass-0: sparse non-convex regression by local search (2016, submitted)
Hoffmann, I., Serneels, S., Filzmoser, P., Croux, C.: Sparse partial robust M regression. Chemom. Intel. Lab. Syst. 149, 50–59 (2015)
Hubert, M., Rousseeuw, P.J., Van Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23, 92–119 (2008)
Jurečková, J., Sen, P.K., Picek, J.: Methodology in Robust and Nonparametric Statistics. CRC Press, Boca Raton (2012)
Kalina, J.: Highly robust statistical methods in medical image analysis. Biocybern. Biomed. Eng. 32(2), 3–16 (2012)
Kalina, J.: Classification analysis methods for high-dimensional genetic data. Biocybern. Biomed. Eng. 34, 10–18 (2014)
Kalina, J., Zvárová J.: Decision support systems in the process of improving patient safety. In: Bioinformatics: Concepts, Methodologies, Tools, and Applications, pp. 1113–1125. IGI Global, Hershey (2013)
Pourahmadi, M.: High-Dimensional Covariance Estimation. Wiley, Hoboken (2013)
Roelant, E., Van Aelst, S., Willems, G.: The minimum weighted covariance determinant estimator. Metrika 70, 177–204 (2009)
Tibshirani, R., Hastie, T., Narasimhan, B.: Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Stat. Sci. 18, 104–117 (2003)
Tyler, D.E.: A distribution-free M-estimator of multivariate scatter. Ann. Stat. 15, 234–251 (1987)
Víšek, J.Á.: Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47, 179–206 (2011)
Wilms, I., Croux, C.: Robust sparse canonical correlation analysis. BMC Systems Biology 10, 72 (2016)
Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B.: Robust Data Mining. Springer, New York (2013)
Acknowledgements
The work is supported by the project “National Institute of Mental Health (NIMH-CZ)”, grant number CZ.1.05/2.1.00/03.0078 of the European Regional Development Fund, Neuron Fund for Support of Science, and the Czech Science Foundation project No. 13-23940S.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kalina, J., Hlinka, J. (2017). On Coupling Robust Estimation with Regularization for High-Dimensional Data. In: Palumbo, F., Montanari, A., Vichi, M. (eds) Data Science . Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-55723-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-55723-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55722-9
Online ISBN: 978-3-319-55723-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)