Abstract
The concept of regression quantiles provides a natural approach to the analysis of the general linear model. In addition to providing methods of statistical inference, the regression quantile computation provides useful information concerning the presence of outliers. Two methods based on regression quantiles can be suggested; one involves “peeling” observations fit exactly by extreme quantiles, and the other comes more directly from the computation of the quantile function. Simulated data sets containing outliers are used to compare these methods with the use of Cook’s D diagnostic and with Rousseeuw’s method based on a high breakdown “least median of squares”-type estimator. Although all methods fare moderately well in these trials, the “peeling” method is clearly the most efficient at identifying outliers. In an effort to explain the relatively poorer performance of Rousseuw’s method, it is shown that the “least median of squares” estimator is not an elemental solutions (i.e., fit exactly by p observations), but is determined to have exactly (p + 1) equal residuals when there are p parameters.
Key words
This research was partially supported by NSF grant DMS 88-02555 and Air Force grant AFOSR 87-0041
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Abbreviations
- AMS(MOS) subject classifications:
-
primary: 62J05, 62G35; secondary: 62F10
References
Atkinson, A.C. (1986), Masking unmasked, Biometrika 73, 533–541.
Bassett, G.W., Koenker, R.W. (1982), An empirical quantile function for linear models with iid errors, J. Amer. Stat. Assoc., 77, 407–415.
Belsley, Kuh, Welsch (1980), Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, Wiley, New York.
Cook, R.D., Weisberg, S. (1982), Residuals and Influence in Regression, Chapman and Hall, NY.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A. (1986), Robust Statistics: the Approach Based on Influence Functions, Wiley, NY.
Hawkins, D.M., Bradu, D., Kass, G.V. (1984), Location of several outliers in multiple-regression data using elemental sets, Technometrics 26, 197–208.
Joss, J., Marazzi, A. (1990), Probabilistic algorithms for least median of squares regression, Comp. Stat. Data Anal., 9, 123–133.
Jurečkova, J., Portnoy, S. (1987), Asymptotics for one-step M estimators in regression with application to combining efficiency and high breakdown point, Comm. Statis., Theory and Methods, 16, 2187–2200.
Koenker, R.W. (1987), A Comparison of Asymptotic Testing Methods for l 1 regression, Statistical Data Analysis Based on the L 1 Norm and Related Methods (ed: Y. Dodge), North Holland, Amsterdam, 287–295.
Koenker, R.W., Bassett, G.W. (1978), Regression quantiles, Econometrica, 46, 33–50.
Koenker, R.W., d’Orey, V. (1987), Computing Regression Quantiles, Applied Statist., 36, 383v393
Koenker, R.W., Portnoy, S. (1987), L-Estimation for the Linear Model, J. Amer. Statist. Assoc., 82, 851–857.
Portnoy, S. (1984), Tightness of the sequence of cdf processes defined from regression fractiles, Robust and Nonlinear Time Series Analysis (eds: Franke, Hardle, Martin), Springer-Verlag, New York, 231–246.
Portnoy, S. (1987), Using regression fractiles to identify outliers, Statistical Data Analysis Based on the L 1 Norm and Related Methods (ed: Y. Dodge), North Holland, Amsterdam, 345–356.
Portnoy, S. (1988), Asymptotic behavior of the number of regression quantile breakpoints, to appear: J. Sci. Statist. Computing.
Portnoy, S., Koenker, R.W. (1989), Adaptive L-estimation of linear models, Ann. Statist., 17, 362–381.
Rousseeuw, P. (1984), Least median of squares regression, J. Amer. Statist. Assoc. 79, 871–880.
Rousseeuw, P., Leroy, A. (1987), Robust Regression and Outlier Detection, Wiley, NY.
Rousseeuw, P., Yohai, V. (1984), Robust regression by means of S-estimates, Proc. of Worskhop on Robust and Nonlinear Meth. in Time Series Analysis, Lecture Notes in Statistics, 26, Springer, 256–272.
Ruppert D., Carroll, R.J. (1980), Trimmed least squares estimation in the linear model, J. Amer. Statist. Assoc., 75, 828–838.
Siegel, A.F. (1982), Robust regression using repeated medians, Biometrika, 69, 242–244.
Souvaine, D.L., Steele, J.M. (1987), Time and space efficient algorithms for least median of squares regression, J. Amer. Statist. Assoc., 82, 794–801.
Yohai, V. (1987), High breakdown-point and high efficiency robust estimates for regression, Ann. Statist. 15, 642–656.
Yohai, V., Zaman, R. (1988), High breakdown-point estimates of regression by means of the minimization of efficient scale, J. Amer. Stat. Assoc., 83, 406–413.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1991 Springer-Verlag New York, Inc.
About this paper
Cite this paper
Portnoy, S. (1991). Regression Quantile Diagnostics for Multiple Outliers. In: Directions in Robust Statistics and Diagnostics. The IMA Volumes in Mathematics and its Applications, vol 34. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-4444-8_8
Download citation
DOI: https://doi.org/10.1007/978-1-4612-4444-8_8
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-8772-8
Online ISBN: 978-1-4612-4444-8
eBook Packages: Springer Book Archive