Advertisement

Advances in self-organizing maps for their application to compositional data

  • Josep A. Martín-FernándezEmail author
  • Mark A. Engle
  • Leslie F. Ruppert
  • Ricardo A. Olea
Original Paper
  • 185 Downloads

Abstract

A self-organizing map (SOM) is a non-linear projection of a D-dimensional data set, where the distance among observations is approximately preserved on to a lower dimensional space. The SOM arranges multivariate data based on their similarity to each other by allowing pattern recognition leading to easier interpretation of higher dimensional data. The SOM algorithm allows for selection of different map topologies, distances and parameters, which determine how the data will be organized on the map. In the particular case of compositional data (such as elemental, mineralogical, or maceral abundance), the sample space is governed by Aitchison geometry and extra steps are required prior to their SOM analysis. Following the principle of working on log-ratio coordinates, the simplicial operations and the Aitchison distance, which are appropriate elements for the SOM, are presented. With this structure developed, a SOM using Aitchison geometry is applied to properly interpret elemental data from combustion products (bottom ash, fly ash, and economizer fly ash) in a Wyoming coal-fired power plant. Results from this effort provide knowledge about the differences between the ash composition in the coal combustion process.

Keywords

Aitchison distance Coal combustion products Isometric logratio Proportions Simplex 

Notes

Acknowledgements

This work has been supported by the project “CODA-RETOS” (Spanish Ministry of Economy and Competitiveness; Ref: MTM2015-65016-C2-1-R) and the project “Compositional Data Analysis Related to Energy Resources Modeling” (“Salvador de Madariaga” program; “Fulbright” distinction; MECD; Ref.: PRX16/00258). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. We are grateful to C.Ö. Karacan (USGS) and G. Mateu-Figueras (U. de Girona) for their insightful review of a previous version of the paper.

References

  1. Affolter RH, Groves S, Betterton W, Benzel W, Conrad KL, Swanson SM, Ruppert LF, Clough JG, Belkin HE, Kolker A, Hower JC (2011) Geochemical database of feed coal and coal combustion products (CCPs) from five power plants in the United States. U.S. Geological Survey Data Series 635, pamphlet, 19 ppGoogle Scholar
  2. Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability, Chapman & Hall/CRC. Reprinted in 2003 by The Blackburn Press, Caldwell, NJGoogle Scholar
  3. Aitchison J (2008) The single principle of compositional data analysis, continuing fallacies, confusions and misunderstandings and some suggested remedies. In: Daunis-i-Estadella J, Martín-Fernández JA (eds) Proceedings of CODAWORK’08, The 3rd Compositional Data Analysis Workshop, May 27–30, University of Girona, Girona (Spain), CD-ROM (ISBN: 978-84-8458-272-4, http://hdl.handle.net/10256/706)
  4. Akinduko AA, Mirkes EM, Gorban AN (2016) SOM: stochastic initialization versus principal components. Inf Sci 364–365:213–221CrossRefGoogle Scholar
  5. Barceló-Vidal C, Martín-Fernández JA (2016) The mathematics of compositional analysis. Austrian J Stat 45(4):57–71CrossRefGoogle Scholar
  6. Cortés JA, Palma JL (2013) Geological applications of self-organizing maps to multidimensional compositional data. Pioneer J Adv Appl Math 7(2):17–49Google Scholar
  7. Cox TF, Cox MAA (2001) Multidimensional scaling, 2nd edn. CRC Press, Boca Raton, p 308Google Scholar
  8. Dickson BL, Giblin AM (2007) An evaluation of methods for imputation of missing trace element data in groundwaters. Geochem Explor Environ Anal 7:173–178CrossRefGoogle Scholar
  9. Edjabou ME, Martín-Fernández JA, Scheutz C, Astrup TF (2017) Statistical analysis of solid waste composition data: arithmetic mean, standard deviation and correlation coefficients. Waste Manag 69:13–23CrossRefGoogle Scholar
  10. Egozcue JJ, Daunis-i-Estadella J, Pawlowsky-Glahn V, Hron K, Filzmoser P (2012) Simplicial regression. The normal model. J Appl Probab Stat 6(1):87–108Google Scholar
  11. Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley, Chichester, p 330CrossRefGoogle Scholar
  12. Jarauta-Bragulat E, Hervada-Sala C, Egozcue JJ (2016) Air quality index revisited from a compositional point of view. Math Geosci 48(5):581–593CrossRefGoogle Scholar
  13. Jolliffe IT (2002) Principal component analysis. Springer Series in Statistics, 2nd edn. Springer, New York, p 487Google Scholar
  14. Kohonen T (2001) Self-organizing maps. Number 30 in Springer Series in Information Sciences, 3rd edn. Springer, Berlin, p 501Google Scholar
  15. Kolker A, Scott C, Hower JC, Vazquez JA, Lopano CL, Dai S (2017) Distribution of rare earth elements in coal combustion fly ash, determined by SHRIMP-RG ion microprobe. Int J Coal Geol 184:1–10CrossRefGoogle Scholar
  16. Martín-Fernández JA, Daunis-i-Estadella J, Mateu-Figueras G (2015) On the interpretation of differences between groups for compositional data. SORT 39(2):231–252Google Scholar
  17. Martín-Fernández JA, Olea RA, Ruppert LF (2018a) Compositional data analysis of coal combustion products with an application to a Wyoming power plant. Math Geosci 50(6):639–657CrossRefGoogle Scholar
  18. Martín-Fernández JA, Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2018b) Principal balances for compositional data. Math Geosci 50(3):273–298CrossRefGoogle Scholar
  19. Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue JJ (2011) The principle of working on coordinates. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester.  https://doi.org/10.1002/9781119976462.ch3 Google Scholar
  20. Melssen W, Wehrens R, Buydens L (2006) Supervised Kohonen networks for classification problems. Chemom Int Lab Syst 83:99–113CrossRefGoogle Scholar
  21. Olea RA, Janardhana Raju N, Egozcue JJ, Pawlowsky-Glahn V, Singh Shubhra (2018) Advancements in hydrochemistry mapping: application to groundwater arsenic and iron concentrations in Varanasi, Uttar Pradesh, India. Stoch Env Res Risk Assess 32(1):241–259CrossRefGoogle Scholar
  22. Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions—R package for multivariate imputation of nondetects and zeros in compositional data sets. Chemom Intell Lab Syst 143:85–96CrossRefGoogle Scholar
  23. Palarea-Albaladejo J, Martín-Fernández JA, Soto JA (2012) Dealing with distances and transformations for fuzzy C-means clustering of compositional data. J Classif 29:144–169CrossRefGoogle Scholar
  24. Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester, p 378Google Scholar
  25. Ruhl L, Vengosh A, Dwyer GS, Hsu-Kim H, Deonarine A, Bergin M, Kravchenko J (2009) Survey of the potential environmental and health impacts in the immediate aftermath of the coal ash spill in Kingston, Tennessee. Environ Sci Technol 43:6326–6333CrossRefGoogle Scholar
  26. Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40CrossRefGoogle Scholar
  27. Swanson SM, Engle MA, Ruppert LF, Affolter RH, Jones KB (2013) Partitioning of selected trace elements in coal combustion products from two coal-burning power plants in the United States. Int J Coal Geol 113:116–126CrossRefGoogle Scholar
  28. Vasighi M, Kompany-Zareh M (2013) Classification ability of self-organizing maps in comparison with other classification methods. Commun Math Comput Chem 70:29–44Google Scholar
  29. Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600CrossRefGoogle Scholar
  30. Wehrens R, Buydens LMC (2007) Self- and Super-organizing maps in R: the kohonen package. J Stat Softw 21(5):1–19CrossRefGoogle Scholar

Copyright information

© This is a U.S. government work and its text is not subject to copyright protection in the United States; however, its text may be subject to foreign copyright protection 2019

Authors and Affiliations

  1. 1.Department of Computer Science, Applied Mathematics and StatisticsUniversity of GironaGironaSpain
  2. 2.U.S. Geological SurveyRestonUSA
  3. 3.Department of Geological SciencesUniversity of Texas at El PasoEl PasoUSA

Personalised recommendations