Exploring the Periphery of Data Scatters: Are There Outliers?

Porzio, Giovanni C.; Ragozini, Giancarlo

doi:10.1007/978-3-642-59789-3_38

Exploring the Periphery of Data Scatters: Are There Outliers?

Giovanni C. Porzio⁸ &
Giancarlo Ragozini⁹

Conference paper

1835 Accesses

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Outliers are observations that are particularly discordant with respect to others, lying hence on the periphery of the data region. In the literature, many tools have been proposed with the aim of detecting multiple outliers. Most of the recent and attractive methods are based on some measure of the distance of each data point from a center. However, they are really effective only if the shape of the data scatter is symmetrical with respect to such a center. Otherwise, asymmetry will make these measures misleading. For this reason, we propose a method that allows direct exploration of the periphery of the data scatter, without considering any center. The methodology we propose is based on a two-step procedure that exploits the sample convex hull and radial projections. It explores gaps in the data scatter and proximities to its boundary, highlighting how the data structure is sparse at its periphery. A complementary graphical display is finally offered as a useful tool to visualize boundary features.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ATKINSON, A.C. (1994): Fast Very Robust Methods for the Detection of Multiple Outliers. Journal of the American Statistical Society, 89, 1329–1339.
Google Scholar
BARNETT, V. (1976): The ordering of multivariate data (with discussion). Journal of Royal Statistical Society A, 139, 318–54.
Article Google Scholar
BARNETT, V. and LEWIS T.(1994): Outliers in Statistical Data (3rd ed.). Wiley, New York.
Google Scholar
HADI, A.S. (1992): Identifying Multiple Outliers in Multivariate Data. Journal of Royal Statistical Society, Ser.B, 54, 761–771.
Google Scholar
MAHALANOBIS, P.C. (1936): On the Generalized Distance in Statistics. Proc. Nat Inst. Sci. India A2, 49–55.
Google Scholar
ROHLF, F.J. (1975): Generalization of the gap test for the detection of multivariate outliers, Biometrics. 31, 93–101.
Article Google Scholar
ROUSSEEUW, P.J. and van ZOMEREN, B.C. (1990): Unmasking Multivariate Outliers and Leverage Points. Journal of the American Statistical Society, 85, 633–639.
Google Scholar
WILKS, S.S.(1963): Multivariate Statistical Outliers. Sankhya, Ser. A, 25, 407–426.
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Scienze Statistiche, Università degli Studi di Napoli Federico II, Via G. Sanfelice 46, 80134, Napoli, Italy
Giovanni C. Porzio
Dipartimento di Matematica e Statistica, Università degli Studi di Napoli Federico II, Via Cintia, 80126, Napoli, Italy
Giancarlo Ragozini

Authors

Giovanni C. Porzio
View author publications
You can also search for this author in PubMed Google Scholar
Giancarlo Ragozini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Groningen Heymans Institute (PA), Grote Kruisstraat 2/1, NL-9712 TS, Groningen, The Netherlands
Henk A. L. Kiers
Facultés Universitaires Notre-Dame de la Paix, University of Namur, Rempart de la Vierge, 8, B-5000, Namur, Belgium
Jean-Paul Rasson (Directeur du Department de Mathématique) (Directeur du Department de Mathématique)
Data Theory Group Department of Education, Leiden University, P.O. Box 9555, NL-2300 RB, Leiden, The Netherlands
Patrick J. F. Groenen
Lehrstuhl für Wirtschaftsinformatik III Schloß, University of Mannheim, D-68131, Mannheim, Germany
Martin Schader

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Porzio, G.C., Ragozini, G. (2000). Exploring the Periphery of Data Scatters: Are There Outliers?. In: Kiers, H.A.L., Rasson, JP., Groenen, P.J.F., Schader, M. (eds) Data Analysis, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-59789-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-59789-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67521-1
Online ISBN: 978-3-642-59789-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics