Cluster Analysis of Untargeted Metabolomic Experiments

  • Joshua HeinemannEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1859)


Untargeted metabolite profiling based upon LC-MS methodology can be used to identify unique metabolic phenotypes associated with stress, disease or environmental exposure of cells using mathematical clustering. Here, we show how unsupervised data analysis is a powerful tool for both quality control and answering simple biological questions. We will demonstrate how to format untargeted mass spectrometry data for import into R, a programming language and software environment for statistical computing (R Development Core Team. R: A language and environment for statistical computing, reference index version 2.15. R Foundation for Statistical Computing, Vienna, 2012). Using R, we transform untargeted metabolite data using hierarchical clustering and principal component analysis (PCA) to create visual representations of change between biological samples and explore how these can be used predictively, in determining environmental stress, health and metabolic insight.

Key words

Clustering Cluster analysis Pattern recognition Untargeted metabolomics Phenotyping Data mining 



The authors would also like to acknowledge that this work was part of the DOE Joint BioEnergy Institute ( supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the US Department of Energy.


  1. 1.
    Development Core Team R (2012) R: A language and environment for statistical computing, reference index version 2.15.1. R Foundation for Statistical Computing, ViennaGoogle Scholar
  2. 2.
    Patti GJ, Tautenhahn R, Siuzdak G et al (2012) Meta-analysis of untargeted metabolomic data from multiple profiling experiments. Nat Protoc 7(3):508–516CrossRefGoogle Scholar
  3. 3.
    Patti GJ, Yanes O, Shriver LP, Courade J, Tautenhahn R, Manchester M, Siuzdak G et al (2012) Metabolomics implicates altered sphingolipids in chronic pain of neuropathic origin. Nat Chem Biol 8(3):232–234CrossRefGoogle Scholar
  4. 4.
    Everitt B (1974) Cluster analysis. Heinemann Educational Books, LondonGoogle Scholar
  5. 5.
    Hartigan JA (1975) Clustering algorithms. Wiley, New YorkGoogle Scholar
  6. 6.
    Anderberg MR (1973) Cluster analysis for applications. Academic Press, New YorkGoogle Scholar
  7. 7.
    Murtagh F (1985) Multidimensional Clustering Algorithms. In: COMPSTAT Lectures 4. Physica-Verlag, WuerzburgGoogle Scholar
  8. 8.
    Becker RA, Chambers JM, Wilks AR (1988) The new S language. Wadsworth & Brooks/Cole Advanced Books & Software, MontereyGoogle Scholar
  9. 9.
    Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, LondonGoogle Scholar
  10. 10.
    Venables WN, Ripley BD (2002) Modern applied statistics with S. Springer-Verlag, BerlinCrossRefGoogle Scholar
  11. 11.
    Heinemann J, Hamerly T, Maaty WS, Movahed N, Steffens JD, Reeves BD, Hilmer JK, Therien J, Grieco PA, Peters JW, Bothner B et al (2014) Expanding the paradigm of thiol redox in the thermophilic root of life. Biochim Biophys Acta 1840:80–85CrossRefGoogle Scholar
  12. 12.
    Maaty WS, Wiedenheft B, Tarlykov P, Schaff N, Heinemann J, Robison-Cox J, Valenzuela J, Bothner B et al (2009) Something old, something new, something borrowed; how the thermoacidophilic archaeon Sulfolobus solfataricus responds to oxidative stress. PLoS One 4(9):e6964CrossRefGoogle Scholar
  13. 13.
    Gordon AD (1999) Classification. Chapman and Hall / CRC, LondonGoogle Scholar
  14. 14.
    McQuitty LL (1966) Similarity analysis by reciprocal pairs for discrete and continuous data. Educ Psychol Meas 26:825–831CrossRefGoogle Scholar
  15. 15.
    Kessner D, Chambers M, Burke R, Agus D, Mallick P et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24(21):2534–2536CrossRefGoogle Scholar
  16. 16.
    Tautenhahn R, Böttcher C, Neumann S et al (2008) Highly sensitive feature detection for high resolution LC/MS. BMC bioinformatics 9:504CrossRefGoogle Scholar
  17. 17.
    Yanes O, Tautenhahn R, Patti GJ, Siuzdak G et al (2011) Expanding coverage of the metabolome for global metabolite profiling. Anal Chem 83(6):2152–2161CrossRefGoogle Scholar
  18. 18.
  19. 19.
  20. 20.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Environmental Genomics and Systems BiologyLawrence Berkeley National LaboratoryBerkeleyUSA
  2. 2.Joint BioEnergy InstituteEmeryvilleUSA

Personalised recommendations