Advertisement

Mathematical Geosciences

, 41:905 | Cite as

Correlation Analysis for Compositional Data

  • Peter Filzmoser
  • Karel Hron
Article

Abstract

Compositional data need a special treatment prior to correlation analysis. In this paper we argue why standard transformations for compositional data are not suitable for computing correlations, and why the use of raw or log-transformed data is neither meaningful. As a solution, a procedure based on balances is outlined, leading to sensible correlation measures. The construction of the balances is demonstrated using a real data example from geochemistry. It is shown that the considered correlation measures are invariant with respect to the choice of the binary partitions forming the balances. Robust counterparts to the classical, non-robust correlation measures are introduced and applied. By using appropriate graphical representations, it is shown how the resulting correlation coefficients can be interpreted.

Keywords

Correlation analysis Ilr transformation Log-ratio transformation Compositional data Balances Subcompositions Amalgamation Robust statistics 

References

  1. Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall, London, 416 p Google Scholar
  2. Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York, 374 p Google Scholar
  3. Anděl J (1978) Mathematical statistics. SNTL/Alfa, Prague, 346 p (in Czech) Google Scholar
  4. Buccianti A, Pawlowsky-Glahn V (2005) New perspectives on water chemistry and compositional data analysis. Math Geol 37(7):703–727 CrossRefGoogle Scholar
  5. Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley, New York, 584 p Google Scholar
  6. Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37(7):795–828 CrossRefGoogle Scholar
  7. Egozcue JJ, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: From theory to practice. Special publications, vol 264. Geological Society, London, pp 145–160 Google Scholar
  8. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueraz G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300 CrossRefGoogle Scholar
  9. Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248 CrossRefGoogle Scholar
  10. Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58:453–467 CrossRefGoogle Scholar
  11. Harville DA (1997) Matrix algebra from a statistican’s perspective. Springer, New York, 630 p Google Scholar
  12. Johnson R, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Prentice-Hall, London, 816 p Google Scholar
  13. Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Inst Sci India 12:49–55 Google Scholar
  14. Maronna R, Martin RD, Yohai VJ (2006) Robust statistics: Theory and methods. Wiley, New York, 436 p CrossRefGoogle Scholar
  15. Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado J (2007) Lecture notes on compositional data analysis. http://diobma.udg.edu/handle/10256/297/
  16. Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond LX:489–502 Google Scholar
  17. R Development Core Team (2008) R: A language and environment for statistical computing. Vienna, http://www.r-project.org
  18. Reimann C, Filzmoser P (2000) Normal and lognormal data distribution in geochemistry: Death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environ Geol 39:1001–1014 CrossRefGoogle Scholar
  19. Reimann C, Äyräs M, Chekushin V, Bogatyrev I, Boyd R, Caritat PD, Dutter R, Finne T, Halleraker J, Jæger O, Kashulina G, Lehto O, Niskavaara H, Pavlov V, Räisänen M, Strand T, Volden T (1998) Environmental geochemical atlas of the Central Barents region. Special publication. Geological Survey of Norway (NGU), Geological Survey of Finland (GTK), and Central Kola Expedition (CKE), Trondheim, Espoo, Monchegorsk, 745 p Google Scholar
  20. Reimann C, Filzmoser P, Garrett RG, Dutter R (2008) Statistical data analysis explained. Applied environmental statistics with R. Wiley, Chichester, 362 p Google Scholar
  21. Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223 CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geosciences 2008

Authors and Affiliations

  1. 1.Dept. of Statistics and Probability TheoryVienna University of TechnologyViennaAustria
  2. 2.Dept. of Mathematical Analysis and Applications of MathematicsPalacký University OlomoucOlomoucCzech Republic

Personalised recommendations