Mathematical Geosciences

, Volume 40, Issue 3, pp 233–248 | Cite as

Outlier Detection for Compositional Data Using Robust Methods

  • Peter Filzmoser
  • Karel Hron


Outlier detection based on the Mahalanobis distance (MD) requires an appropriate transformation in case of compositional data. For the family of logratio transformations (additive, centered and isometric logratio transformation) it is shown that the MDs based on classical estimates are invariant to these transformations, and that the MDs based on affine equivariant estimators of location and covariance are the same for additive and isometric logratio transformation. Moreover, for 3-dimensional compositions the data structure can be visualized by contour lines. In higher dimension the MDs of closed and opened data give an impression of the multivariate data behavior.


Mahalanobis distance Robust statistics Ternary diagram Multivariate outliers Logratio transformation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall, London, 416 p Google Scholar
  2. Aitchison J (1992) On criteria for measures of compositional difference. Math Geol 24(4):365–379 CrossRefGoogle Scholar
  3. Aitchison J, Egozcue JJ (2005) Compositional data analysis: where are we and where should we be heading? Math Geol 37(7):829–850 CrossRefGoogle Scholar
  4. Barceló C, Pawlowsky V, Grunsky E (1996) Some aspects of transformations of compositional data and the identification of outliers. Math Geol 28(4):501–518 CrossRefGoogle Scholar
  5. Barceló-Vidal CB, Martín-Fernandez JA, Pawlowsky-Glahn V (1999) Comment on “Singularity and nonnormality in the classification of compositional data” by Bohling GC, Davis JC, Olea RA, Harff J (Letter to the editor). Math Geol 31(5):581–585 CrossRefGoogle Scholar
  6. Bohling GC, Davis JC, Olea RA, Harff J (1998) Singularity and nonnormality in the classification of compositional data. Math Geol 30(1):5–20 CrossRefGoogle Scholar
  7. Coakley JP, Rust BR (1968) Sedimentation in an Arctic lake. J Sed Pet 38(4):1290–1300. Quoted in Aitchison (1986), the statistical analysis of compositional data. Chapman & Hall, London, 416 p Google Scholar
  8. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300 CrossRefGoogle Scholar
  9. Filzmoser P, Garrett RG, Reimann C (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31:579–587 CrossRefGoogle Scholar
  10. Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124 CrossRefGoogle Scholar
  11. Hardin J, Rocke DM (2005) The distribution of robust distances. J Comput Graph Stat 14:928–946 CrossRefGoogle Scholar
  12. Harville DA (1997) Matrix algebra from a statistician’s perspective. Springer, New York, 630 p Google Scholar
  13. Maronna R, Zamar R (2002) Robust estimates of location and dispersion for high-dimensional data sets. Technometrics 44(4):307–317 CrossRefGoogle Scholar
  14. Maronna R, Martin RD, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, New York, 436 p CrossRefGoogle Scholar
  15. Martín-Fernández JA, Barceló-Vidal C, Pawlowsky-Glahn V (2003) Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geol 35(3):253–278 CrossRefGoogle Scholar
  16. Peña D, Prieto F (2001) Multivariate outlier detection and robust covariance matrix estimation. Technometrics 43(3):286–310 CrossRefGoogle Scholar
  17. R development core team, 2006, R: A language and environment for statistical computing. Vienna.
  18. Reimann C, Äyräs M, Chekushin V, Bogatyrev I, Boyd R, Caritat P. d., Dutter R, Finne T, Halleraker J, Jæger O, Kashulina G, Lehto O, Niskavaara H, Pavlov V, Räisänen M, Strand T, Volden T (1998) Environmental geochemical atlas of the Central Barents Region: Geological Survey of Norway (NGU), Geological Survey of Finland (GTK), and Central Kola Expedition (CKE), Special Publication, Trondheim, Espoo, Monchegorsk, 745 p Google Scholar
  19. Rousseeuw PJ, Leroy AM (2003) Robust regression and outlier detection. Wiley, New York, 360 p Google Scholar
  20. Rousseeuw P, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223 CrossRefGoogle Scholar
  21. Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633–651 CrossRefGoogle Scholar
  22. Thompson RN, Esson J, Duncan AC (1972) Major element chemical variation in the Eocene lavas of the Isle of Skye Scotland. J Petrol 13(2):219–253. Quoted in Aitchison, J., 1986, The statistical analysis of compositional data. Chapman & Hall, London, 416 p Google Scholar
  23. Visuri S, Koivunen V, Oja H (2000) Sign and rank covariance matrices. J Stat Plan Inference 91:557–575 CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geology 2008

Authors and Affiliations

  1. 1.Dept. of Statistics and Probability TheoryVienna University of TechnologyViennaAustria
  2. 2.Dept. of Mathematical Analysis and Applications of MathematicsPalacký University OlomoucOlomoucCzech Republic

Personalised recommendations