Advertisement

Exploring the Periphery of Data Scatters: Are There Outliers?

  • Giovanni C. Porzio
  • Giancarlo Ragozini
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Outliers are observations that are particularly discordant with respect to others, lying hence on the periphery of the data region. In the literature, many tools have been proposed with the aim of detecting multiple outliers. Most of the recent and attractive methods are based on some measure of the distance of each data point from a center. However, they are really effective only if the shape of the data scatter is symmetrical with respect to such a center. Otherwise, asymmetry will make these measures misleading. For this reason, we propose a method that allows direct exploration of the periphery of the data scatter, without considering any center. The methodology we propose is based on a two-step procedure that exploits the sample convex hull and radial projections. It explores gaps in the data scatter and proximities to its boundary, highlighting how the data structure is sparse at its periphery. A complementary graphical display is finally offered as a useful tool to visualize boundary features.

Keywords

Data Region Mahalanobis Distance Outlier Detection Data Cloud Royal Statistical Society 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ATKINSON, A.C. (1994): Fast Very Robust Methods for the Detection of Multiple Outliers. Journal of the American Statistical Society, 89, 1329–1339.Google Scholar
  2. BARNETT, V. (1976): The ordering of multivariate data (with discussion). Journal of Royal Statistical Society A, 139, 318–54.CrossRefGoogle Scholar
  3. BARNETT, V. and LEWIS T.(1994): Outliers in Statistical Data (3rd ed.). Wiley, New York.Google Scholar
  4. HADI, A.S. (1992): Identifying Multiple Outliers in Multivariate Data. Journal of Royal Statistical Society, Ser.B, 54, 761–771.Google Scholar
  5. MAHALANOBIS, P.C. (1936): On the Generalized Distance in Statistics. Proc. Nat Inst. Sci. India A2, 49–55.Google Scholar
  6. ROHLF, F.J. (1975): Generalization of the gap test for the detection of multivariate outliers, Biometrics. 31, 93–101.CrossRefGoogle Scholar
  7. ROUSSEEUW, P.J. and van ZOMEREN, B.C. (1990): Unmasking Multivariate Outliers and Leverage Points. Journal of the American Statistical Society, 85, 633–639.Google Scholar
  8. WILKS, S.S.(1963): Multivariate Statistical Outliers. Sankhya, Ser. A, 25, 407–426.Google Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 2000

Authors and Affiliations

  • Giovanni C. Porzio
    • 1
  • Giancarlo Ragozini
    • 2
  1. 1.Dipartimento di Scienze StatisticheUniversità degli Studi di Napoli Federico IINapoliItaly
  2. 2.Dipartimento di Matematica e StatisticaUniversità degli Studi di Napoli Federico IINapoliItaly

Personalised recommendations