, 14:108 | Cite as

Comparing normalization methods and the impact of noise

  • Thao Vu
  • Eli Riekeberg
  • Yumou Qiu
  • Robert PowersEmail author
Original Article



Failure to properly account for normal systematic variations in OMICS datasets may result in misleading biological conclusions. Accordingly, normalization is a necessary step in the proper preprocessing of OMICS datasets. In this regards, an optimal normalization method will effectively reduce unwanted biases and increase the accuracy of downstream quantitative analyses. But, it is currently unclear which normalization method is best since each algorithm addresses systematic noise in different ways.


Determine an optimal choice of a normalization method for the preprocessing of metabolomics datasets.


Nine MVAPACK normalization algorithms were compared with simulated and experimental NMR spectra modified with added Gaussian noise and random dilution factors. Methods were evaluated based on an ability to recover the intensities of the true spectral peaks and the reproducibility of true classifying features from orthogonal projections to latent structures—discriminant analysis model (OPLS-DA).


Most normalization methods (except histogram matching) performed equally well at modest levels of signal variance. Only probabilistic quotient (PQ) and constant sum (CS) maintained the highest level of peak recovery (> 67%) and correlation with true loadings (> 0.6) at maximal noise.


PQ and CS performed the best at recovering peak intensities and reproducing the true classifying features for an OPLS-DA model regardless of spectral noise level. Our findings suggest that performance is largely determined by the level of noise in the dataset, while the effect of dilution factors was negligible. A minimal allowable noise level of 20% was also identified for a valid NMR metabolomics dataset.


Metabolomics Normalization Noise NMR Preprocessing chemometrics 



Nuclear magnetic resonance


Principal components analysis


Orthogonal projections to latent structures—discriminant analysis


Probabilistic quotient


Histogram matching


Standard normal variate


Multiplicative scatter correction




Natural cubic splines


Smoothing splines


Constant sum


Region of interest


Phase-scatter correction


LOcally Estimated Scatterplot Smoothing


Receiver operating characteristic curve




Standard deviation



We thank Dr. Martha Morton, the Director of the Research Instrumentation Facility in the Department of Chemistry at the University of Nebraska-Lincoln for her assistance with the NMR experiments. This material is based upon work supported by the National Science Foundation under Grant Number (1660921). This work was supported in part by funding from the Redox Biology Center (P30 GM103335, NIGMS); and the Nebraska Center for Integrated Biomolecular Communication (P20 GM113126, NIGMS). The research was performed in facilities renovated with support from the National Institutes of Health (RR015468-01). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author contributions

TV and ER performed the experiments; RP and YQ designed the experiments; TV, ER, YQ, and RP analyzed the data and wrote the manuscript.

Compliance with ethical standards

Conflict of interest

Authors have no conflict of interest to declare.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

11306_2018_1400_MOESM1_ESM.pdf (580 kb)
Supplementary material 1 (PDF 579 KB)


  1. Aardema, M. J., & MacGregor, J. T. (2002). Toxicology and genetic toxicology in the new era of “toxicogenomics”: Impact of “-omics” technologies. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 499, 13–25. Scholar
  2. Barnes, R. J., Dhanda, M. S., & Lister, S. J. (1989). Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Applied Spectroscopy, 43, 772–777.CrossRefGoogle Scholar
  3. Berger, B., Peng, J., & Singh, M. (2013). Computational solutions for omics data. Nature Reviews Genetics, 14, 333–346. Scholar
  4. Butcher, E. C., Berg, E. L., & Kunkel, E. J. (2004). Systems biology in drug discovery. Nature Biotechnology, 22, 1253. Scholar
  5. Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W. J., Webb-Robertson, B. J. M., … Lipton, M. S. (2006). Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. Journal of Proteome Research, 5, 277–286. Scholar
  6. Chawade, A., Alexandersson, E., & Levander, F. (2014). Normalyzer: A tool for rapid evaluation of normalization methods for omics data sets. Journal of Proteome Research, 13, 3114–3120. Scholar
  7. Chen, R., Mias, G. I., Li-Pook-Than, J., Jiang, L., Lam, H. Y., Chen, R., … Cheng, Y. (2012). Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell, 148, 1293–1307. Scholar
  8. Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M., & Halfon, M. S. (2005). Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biology, 6, R16. Scholar
  9. Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267. Scholar
  10. Cuykx, M., Claes, L., Rodrigues, R. M., Vanhaecke, T., & Covaci, A. (2018). Metabolomics profiling of steatosis progression in HepaRG® cells using sodium valproate. Toxicology Letters, 286, 22–30. Scholar
  11. Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Analytical Chemistry, 78, 4281–4290. Scholar
  12. Doran, M. L., Knee, J. M., Wang, N., Rzezniczak, T. Z., Parkes, T. L., Li, L., & Merritt, T. J. (2017). Metabolomic analysis of oxidative stress: Superoxide dismutase mutation and paraquat induced stress in Drosophila melanogaster. Free Radical Biology and Medicine, 113, 323–334. Scholar
  13. Fujioka, H., & Kano, H. (2005). Smoothing spline curves and surfaces for sampled data. International Journal of Innovative Computing, 1, 429–449.Google Scholar
  14. Fukushima, A., Iwasa, M., Nakabayashi, R., Kobayashi, M., Nishizawa, T., Okazaki, Y., … Kusano, M. (2017). Effects of combined low glutathione with mild oxidative and low phosphorus stress on the metabolism of Arabidopsis thaliana. Frontiers in Plant Science, 8, 1464.CrossRefPubMedPubMedCentralGoogle Scholar
  15. Giraudeau, P., Tea, I., Remaud, G. S., & Akoka, S. (2014). Reference and normalization methods: Essential tools for the intercomparison of NMR spectra. Journal of Pharmaceutical and Biomedical Analysis, 93, 3–16. Scholar
  16. Halouska, S., Zhang, B., Gaupp, R., Lei, S., Snell, E., Fenton, R. J., ... Powers, R. (2013). Revisiting protocols for the NMR analysis of bacterial metabolomes. Journal of Integrated OMICS, 2, 120–137.Google Scholar
  17. Halouska, S., & Powers, R. (2006). Negative impact of noise on the principal component analysis of NMR data. Journal of Magnetic Resonance, 178, 88–95.CrossRefPubMedGoogle Scholar
  18. Hochrein, J., Zacharias, H. U., Taruttis, F., Samol, C., Engelmann, J. C., Spang, R., … Gronwald, W. (2015). Data normalization of 1H NMR metabolite fingerprinting data sets in the presence of unbalanced metabolite regulation. Journal of Proteome Research, 14, 3217–3228. Scholar
  19. Jung, Y.-S., Lee, J., Seo, J., & Hwang, G.-S. (2017). Metabolite profiling study on the toxicological effects of polybrominated diphenyl ether in a rat model. Environmental Toxicology, 32, 1262–1272. Scholar
  20. Kohl, S. M., Klein, M. S., Hochrein, J., Oefner, P. J., Spang, R., & Gronwald, W. (2012). State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics, 8, 146–160. Scholar
  21. R Development Core Team. (2017). R: A language and environment for statistical computing. Austria: R Foundation for Statistical Computing Vienna.Google Scholar
  22. Thulin, E., Thulin, M., & Andersson, D. I. (2017). Reversion of high-level mecillinam resistance to susceptibility in Escherichia coli during growth in urine. EBioMedicine, 23, 111–118. Scholar
  23. Torgrip, R. J. O., Åberg, K. M., Alm, E., Schuppe-Koistinen, I., & Lindberg, J. (2008). A note on normalization of biofluid 1D 1H-NMR data. Metabolomics, 4, 114–121. Scholar
  24. Weisstein, E. W. (2017). Cauchy distribution. In: MathWorld.
  25. Windig, W., Shaver, J., & Bro, R. (2008). Loopy MSC: A simple way to improve multiplicative scatter correction. Applied Spectroscopy, 62, 1153–1159. Scholar
  26. Wishart, D. S. (2008). Metabolomics: Applications to food science and nutrition research. Trends in Food Science & Technology, 19, 482–493. Scholar
  27. Workman, C., Jensen, L. J., Jarmer, H., Berka, R., Gautier, L., Nielser, H. B., … Knudsen, S. (2002). A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biology. Scholar
  28. Worley, B., & Powers, R. (2013). Multivariate analysis in metabolomics. Current Metabolomics, 1, 92–107. Scholar
  29. Worley, B., & Powers, R. (2014a). MVAPACK: A complete data handling package for NMR metabolomics. ACS Chemical Biology, 9, 1138–1144. Scholar
  30. Worley, B., & Powers, R. (2014b). Simultaneous phase and scatter correction for NMR datasets. Chemometrics and Intelligent Laboratory Systems, 131, 1–6. Scholar
  31. Worley, B., & Powers, R. (2016). PCA as a practical indicator of OPLS-DA model reliability. Current Metabolomics, 4, 97–103. Scholar
  32. Zyprych-Walczak, J., Szabelska, A., Handschuh, L., Górczak, K., Klamecka, K., Figlerowicz, M., & Siatkowski, I. (2015). The impact of normalization methods on RNA-Seq data analysis. BioMed Research International. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of Nebraska-LincolnLincolnUSA
  2. 2.Department of ChemistryUniversity of Nebraska-LincolnLincolnUSA
  3. 3.Nebraska Center for Integrated Biomolecular CommunicationLincolnUSA

Personalised recommendations