Abstract
We describe empirical work in the domain of clustering and outlier detection, for the analysis of European trade data. It is our first attempt to evaluate benefits and limitations of the forward search approach for regression and multivariate analysis Atkinson and Riani (Robust diagnostic regression analysis, Springer, 2000), Atkinson et al. (Exploring multivariate data with the forward search, Springer, 2004), within a concrete application scenario and in relation to a comparable backward method developed in the JRC by Arsenis et al. (Price outliers in eu external trade data, Enlargement and Integration Workshop 2005, 2005). Our findings suggest that the automatic clustering based on Mahalanobis distances may be inappropriate in presence of a high-density area in the dataset. Follow up work is discussed extensively in Riani et al. (Fitting mixtures of regression lines with the forward search, Mining massive data sets for security, IOS, 2008).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Unfortunately in the literature there are different terms for this form of standardised residual. Cook and Weisberg (1982) use “externally studentized residual” in contrast to “internally studentized residual” when the context refer to both forms of standardisations, with the current observation deleted or not. Belsley et al. (1980) use “studentized residual” or “RSTUDENT”. The terms “deletion residual” or “jackknife residual” are preferred by Atkinson and Riani (2000).
References
Arsenis, S., Perrotta, D., & Torti, F. (2005). Price outliers in eu external trade data. Internal note, presented at “Enlargement and Integration Workshop 2005”, http://theseus.jrc.it/events.html.
Atkinson, A. C., & Riani, M. (2000). Robust diagnostic regression analysis. New York: Springer
Atkinson, A. C., Riani, M., & Cerioli, A. (2004). Exploring multivariate data with the forward search. New York: Springer
Atkinson, A. C., Riani, M., & Cerioli, A. (2006). Random start forward searches with envelopes for detecting clusters in multivariate data. In S. Zani, A. Cerioli, M. Riani, & M. Vichi (eds.), Data analysis, classification and the forward search (pp. 163–172). Berlin: Springer
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York: Wiley
Cook, R., & Weisberg, S. (1982). Residuals and Influence in Regression. New York: Chapman & Hall. Out of print, available at http://www.stat.umn.edu/rir/.
Riani, M., Cerioli, A., Atkinson, A., Perrotta, D., & Torti, F. (2008). Fitting mixtures of regression lines with the forward search. In F. Fogelman-Soulie, D. Perrotta, J. Piskorski, & R. Steinberger (eds.), Mining massive data sets for security (pp. 271–286). Amsterdam: IOS
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perrotta, D., Torti, F. (2010). Detecting Price Outliers in European Trade Data with the Forward Search. In: Palumbo, F., Lauro, C., Greenacre, M. (eds) Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03739-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-03739-9_47
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03738-2
Online ISBN: 978-3-642-03739-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)