Robust multivariate analysis of compositional data of treated wastewaters

  • Petr PrausEmail author
Original Article


A dataset of water samples collected behind a biological wastewater treatment plant (BWWTP) during a year was processed as compositional data by a log-ratio transformation and then analysed by a robust principal component analysis (RPCA) and the robust Mahalanobis distances (RMDs). For this purpose, covariance matrices were computed using a minimum covariance determinant (MCD) algorithm. Raw and transformed 11 physico-chemical parameters were reduced to 4 robust principal components (RPCs). Correlations between centre log-ratio (clr)-transformed parameters and RPCs were found to be more realistic than those between the parameters and RPCs of raw data. The first and second RPCs represented nitrogen and phosphorus compounds, respectively. Their temporal changes were explained by some processes occurring during biological wastewater treatment. A nitrification process was also demonstrated by the temporal changes of the raw and clr transformed concentrations of ammonium. The robust and classical Mahalanobis distances were computed from the raw and isometric log-ratio (ilr)-transformed data to show the overall temporal changes of treated wastewater composition and to detect outlaying samples.


Compositional data Log-ratio transformation Treated wastewaters Multivariate analysis 



The author thanks Dr. Zdeněk Matěj (Lund University, Sweden) for his help with the MATLAB subroutines. This work was financially supported by the project “Institute of Environmental Technology—Excellent Research” (CZ.02.1.01/0.0/0.0/16_019/0000853) provided by the Ministry of Education, Youth and Sports of the Czech Republic.

Supplementary material

12665_2019_8248_MOESM1_ESM.docx (177 kb)
Supplementary material 1 (DOCX 180 kb)


  1. Aitchison J (1982) The statistical analysis of compositional data. J Roy Stat Soc Ser B (Methodol) 44:139–177Google Scholar
  2. Aitchison J (1983) Principal component analysis of compositional data. Biometrika 70:57–65. CrossRefGoogle Scholar
  3. Aitchison J (1999) Logratios and natural laws in compositional data analysis. Math Geol 31(5):563–580. CrossRefGoogle Scholar
  4. Alpaslan MN (1997) Prevailing problems in environmental data management. In: Harmancioglu NB, Alpaslan MN, Ozkul SD, Singh VP (eds) Integrated approach to environmental data management systems. Springer, Dordrecht, pp 15–22. CrossRefGoogle Scholar
  5. Barceló-Vidal C, Martín-Fernández JA, Pawlowsky-Glahn V (1999) Comment on “Singularity and nonnormality in the classification of compositional data” by G. C. Bohling, J. C. Davis, R. A. Olea, and J. Harff. J. Harff Math Geol 31:581–585. CrossRefGoogle Scholar
  6. Bekele E, Page D, Vanderzalm J, Kaksonen A, Gonzalez D (2018) Water recycling via aquifers for sustainable urban water quality management: current status. Chall Oppor Water 10:457. CrossRefGoogle Scholar
  7. Berthouex PM, Hunter WG, Pallesen L (1978) Monitoring sewage treatment plants: some quality control aspects. J Qual Technol 10:139–149. CrossRefGoogle Scholar
  8. Blake S, Henry T, Murray J, Flood R, Muller MR, Jones AG, Rath V (2016) Compositional multivariate statistical analysis of thermal groundwater provenance: a hydrogeochemical case study from Ireland. Appl Geochem 75:171–188. CrossRefGoogle Scholar
  9. Capilla C (2009) Application and simulation study of the hotelling’s T2 control chart to monitor a wastewater treatment process. Environ Eng Sci 26:333–342. CrossRefGoogle Scholar
  10. Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 1:245–276. CrossRefGoogle Scholar
  11. Corbett CJ, Pan J-N (2002) Evaluating environmental performance using statistical process control techniques. Eur J Oper Res 139:68–83. CrossRefGoogle Scholar
  12. De Maesschalck R, Jouan-Rimbaud D, Massart DL (2000) The Mahalanobis distance. Chemom Intell Lab Syst 50:1–18. CrossRefGoogle Scholar
  13. Drew LJ, Grunsky EC, Schuenemeyer JH (2008) Investigation of the structure of geological process through multivariate statistical analysis—the creation of a coal. In: Bonham-Carter G, Cheng Q (eds) Progress in geomathematics. Springer, Berlin, pp 53–77. CrossRefGoogle Scholar
  14. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35:279–300. CrossRefGoogle Scholar
  15. Egozcue JJ, Pawlowsky-Glahn V, Gloor BG (2018) Linear association in compositional data analysis. Aust J Stat 47:3. CrossRefGoogle Scholar
  16. Engle MA, Gallo M, Schroeder KT, Geboy NJ, Zupancic JW (2014) Three-way compositional analysis of water quality monitoring data. Environ Ecol Stat 21:565–581. CrossRefGoogle Scholar
  17. Filzmoser P, Hron K (2008) outlier detection for compositional data using robust methods. Math Geosci 40:233–248. CrossRefGoogle Scholar
  18. Hubert M, Debruyne M (2009) Minimum covariance determinant. Comput Stat 2:8. CrossRefGoogle Scholar
  19. Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64–79. CrossRefGoogle Scholar
  20. Iglesias C, Sancho J, Piñeiro JI, Martínez J, Pastor JJ, Taboada J (2016) Shewhart-type control charts and functional data analysis for water quality analysis based on a global indicator. Desalin Water Treat 57:2669–2684. CrossRefGoogle Scholar
  21. Jolliffe IT (1986) Principal component analysis and factor analysis. In: Principal component analysis. Springer, New York, pp 115–128. CrossRefGoogle Scholar
  22. Kaiser HF (1960) The application of electronic computers to factor analysis. Educ Psychol Measur 20:141–151. CrossRefGoogle Scholar
  23. Kase R et al (2018) Screening and risk management solutions for steroidal estrogens in surface and wastewater TrAC. Trends Anal Chem 102:343–358. CrossRefGoogle Scholar
  24. Orssatto F, Vilas Boas MA, Nagamine R, Uribe-Opazo MA (2014) Shewhart’s control charts and process capability ratio applied to a sewage treatment station. Engenharia Agrícola 34:770–779CrossRefGoogle Scholar
  25. Pawlowsky-Glahn ABA (2011) Compositional data analysis: theory and applications. John Wiley & Sons Ltd, London. CrossRefGoogle Scholar
  26. Praus P (2005a) SVD-based principal component analysis of geochemical data. Cent Eur J Chem 3:731–741Google Scholar
  27. Praus P (2005b) Water quality assessment using SVD-based principal component analysis of hydrological data. Water SA 31:417–422Google Scholar
  28. Rousseeuw PJ, Driessen KV (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223. CrossRefGoogle Scholar
  29. Sabeen AH, Noor ZZ, Ngadi N, Almuraisy S, Raheem AB (2018) Quantification of environmental impacts of domestic wastewater treatment using life cycle assessment: a review. J Clean Prod 190:221–233. CrossRefGoogle Scholar
  30. Thió-Henestrosa S, Martín-Fernández JA (2005) Dealing with compositional data: the freeware CoDaPack. Math Geol 37:773–793. CrossRefGoogle Scholar
  31. van den Boogaart KG, Tolosana-Delgado R (2013) Fundamental concepts of compositional data analysis. In: analyzing compositional data with R. Springer, Berlin, pp 13–50. CrossRefGoogle Scholar
  32. Verboven S, Hubert M (2005) LIBRA: a MATLAB library for robust analysis. Chemom Intell Lab Syst 75:127–136. CrossRefGoogle Scholar
  33. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2:37–52. CrossRefGoogle Scholar
  34. Wright C, Booth D (2001) Water treatment control using the joint estimation outlier detection method. Environ Model Assess 6:77–82. CrossRefGoogle Scholar
  35. Zhang Z (2016) Environmental data analysis, methods and applications. De Gruyter, Berlin. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Environmental Technology and Department of ChemistryVŠB-Technical University of OstravaOstrava, PorubaCzech Republic

Personalised recommendations