Comparing normalization methods and the impact of noise
Failure to properly account for normal systematic variations in OMICS datasets may result in misleading biological conclusions. Accordingly, normalization is a necessary step in the proper preprocessing of OMICS datasets. In this regards, an optimal normalization method will effectively reduce unwanted biases and increase the accuracy of downstream quantitative analyses. But, it is currently unclear which normalization method is best since each algorithm addresses systematic noise in different ways.
Determine an optimal choice of a normalization method for the preprocessing of metabolomics datasets.
Nine MVAPACK normalization algorithms were compared with simulated and experimental NMR spectra modified with added Gaussian noise and random dilution factors. Methods were evaluated based on an ability to recover the intensities of the true spectral peaks and the reproducibility of true classifying features from orthogonal projections to latent structures—discriminant analysis model (OPLS-DA).
Most normalization methods (except histogram matching) performed equally well at modest levels of signal variance. Only probabilistic quotient (PQ) and constant sum (CS) maintained the highest level of peak recovery (> 67%) and correlation with true loadings (> 0.6) at maximal noise.
PQ and CS performed the best at recovering peak intensities and reproducing the true classifying features for an OPLS-DA model regardless of spectral noise level. Our findings suggest that performance is largely determined by the level of noise in the dataset, while the effect of dilution factors was negligible. A minimal allowable noise level of 20% was also identified for a valid NMR metabolomics dataset.
KeywordsMetabolomics Normalization Noise NMR Preprocessing chemometrics
Nuclear magnetic resonance
Principal components analysis
Orthogonal projections to latent structures—discriminant analysis
Standard normal variate
Multiplicative scatter correction
Natural cubic splines
Region of interest
LOcally Estimated Scatterplot Smoothing
Receiver operating characteristic curve
We thank Dr. Martha Morton, the Director of the Research Instrumentation Facility in the Department of Chemistry at the University of Nebraska-Lincoln for her assistance with the NMR experiments. This material is based upon work supported by the National Science Foundation under Grant Number (1660921). This work was supported in part by funding from the Redox Biology Center (P30 GM103335, NIGMS); and the Nebraska Center for Integrated Biomolecular Communication (P20 GM113126, NIGMS). The research was performed in facilities renovated with support from the National Institutes of Health (RR015468-01). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
TV and ER performed the experiments; RP and YQ designed the experiments; TV, ER, YQ, and RP analyzed the data and wrote the manuscript.
Compliance with ethical standards
Conflict of interest
Authors have no conflict of interest to declare.
This article does not contain any studies with human participants or animals performed by any of the authors.
- Aardema, M. J., & MacGregor, J. T. (2002). Toxicology and genetic toxicology in the new era of “toxicogenomics”: Impact of “-omics” technologies. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 499, 13–25. https://doi.org/10.1016/S0027-5107(01)00292-5.CrossRefPubMedGoogle Scholar
- Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W. J., Webb-Robertson, B. J. M., … Lipton, M. S. (2006). Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. Journal of Proteome Research, 5, 277–286. https://doi.org/10.1021/pr050300l.CrossRefPubMedPubMedCentralGoogle Scholar
- Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Analytical Chemistry, 78, 4281–4290. https://doi.org/10.1021/ac051632c.CrossRefPubMedGoogle Scholar
- Doran, M. L., Knee, J. M., Wang, N., Rzezniczak, T. Z., Parkes, T. L., Li, L., & Merritt, T. J. (2017). Metabolomic analysis of oxidative stress: Superoxide dismutase mutation and paraquat induced stress in Drosophila melanogaster. Free Radical Biology and Medicine, 113, 323–334. https://doi.org/10.1016/j.freeradbiomed.2017.10.011.CrossRefPubMedGoogle Scholar
- Fujioka, H., & Kano, H. (2005). Smoothing spline curves and surfaces for sampled data. International Journal of Innovative Computing, 1, 429–449.Google Scholar
- Fukushima, A., Iwasa, M., Nakabayashi, R., Kobayashi, M., Nishizawa, T., Okazaki, Y., … Kusano, M. (2017). Effects of combined low glutathione with mild oxidative and low phosphorus stress on the metabolism of Arabidopsis thaliana. Frontiers in Plant Science, 8, 1464.CrossRefPubMedPubMedCentralGoogle Scholar
- Halouska, S., Zhang, B., Gaupp, R., Lei, S., Snell, E., Fenton, R. J., ... Powers, R. (2013). Revisiting protocols for the NMR analysis of bacterial metabolomes. Journal of Integrated OMICS, 2, 120–137.Google Scholar
- Hochrein, J., Zacharias, H. U., Taruttis, F., Samol, C., Engelmann, J. C., Spang, R., … Gronwald, W. (2015). Data normalization of 1H NMR metabolite fingerprinting data sets in the presence of unbalanced metabolite regulation. Journal of Proteome Research, 14, 3217–3228. https://doi.org/10.1021/acs.jproteome.5b00192.CrossRefPubMedGoogle Scholar
- R Development Core Team. (2017). R: A language and environment for statistical computing. Austria: R Foundation for Statistical Computing Vienna.Google Scholar
- Weisstein, E. W. (2017). Cauchy distribution. In: MathWorld. http://mathworld.wolfram.com/CauchyDistribution.html.