Abstract
Despite the increasing popularity and applicability of metabolomics for putative biomarker identification, analysis of the data is challenged by low statistical power resulting from the small sample sizes and large numbers of metabolites and other omics information, as well as confounding demographic and clinical variables. To enhance the statistical power and improve reproducibility of the identified metabolite-based biomarkers, we advocate the use of advanced statistical methods that can simultaneously evaluate the relationship between a group of metabolites and various types of variables including other omics profiles, demographic and clinical data, as well as the complex interactions between them. Accordingly, in this chapter, we describe the method of seemingly unrelated regression that can simultaneously analyze multiple metabolites while controlling the confounding effects of demographic and clinical variables (such as gender, age, BMI, smoking status). We also introduce penalized orthogonal components regression as a screening approach that can handle millions of omics predictors in the model.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sud M, Fahy E, Cotter D, Azam K, Vadivelu I, Burant C et al (2016) Metabolomics workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res 44(D1):D463–D470
Kraus WE, Muoio DM, Stevens R, Craig D, Bain JR, Grass E et al (2015) Metabolomic quantitative trait loci (mQTL) mapping implicates the ubiquitin proteasome system in cardiovascular disease pathogenesis. PLoS Genet 11(11):e1005553. https://doi.org/10.1371/journal.pgen.1005553
Blasco H, Nadal-Desbarats L, Pradat PF, Gordon PH, Madji Hounoum B, Patin F et al (2016) Biomarkers in amyotrophic lateral sclerosis: combining metabolomic and clinical parameters to define disease progression. Eur J Neurol 23(2):346–353
Nagana Gowda GA, Djukovic D (2014) Overview of mass spectrometry-based metabolomics: opportunities and challenges. Methods Mol Biol 1198:3–12
Gromski PS, Muhamadali H, Ellis DI, Xu Y, Correa E, Turner ML et al (2015) A tutorial review: metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding. Anal Chim Acta 879:10–23
Alakwaa FM, Chaudhary K, Garmire LX (2017) Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data. J Proteome Res 17:337–347
Cuperlovic-Culf M (2018) Machine learning methods for analysis of metabolic data and metabolic pathway modeling. Metabolites 8(1): pii: E4. doi: https://doi.org/10.3390/metabo8010004
Chen C, Deng L, Wei S, Nagana Gowda GA, Gu H, Chiorean EG et al (2015) Exploring metabolic profile differences between colorectal polyp patients and controls using seemingly unrelated regression. J Proteome Res 14(6):2492–2499
Chen C, Nagana Gowda GA, Zhu J, Deng L, Gu H, Chiorean G et al (2017) Altered metabolite levels and correlations in patients with colorectal cancer and polyps detected using seemingly unrelated regression analysis. Metabolomics 13:125. https://doi.org/10.1007/s11306-017-1265-0
Zhang D, Lin Y, Zhang M (2009) Penalized orthogonal-components regression for large p small n data. Electronic J Stat 3:781–796
Zellner A (1962) An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J Am Stat Assn 57(298):348–368
Lin Y, Zhang M, Wang L, Pungpapong V, Fleet JC, Zhang D (2009) Simultaneous genome-wide association studies of anti-CCP in rheumatoid arthritis using penalized orthogonal-components regression. BMC Proc (Suppl 7):S20
Lin Y, Zhang M, Zhang D (2015) Generalized orthogonal-components regression for high-dimensional generalized linear models. Comput Stat Data Anal 88:119–127
Wang L, Pungpapong V, Lin Y, Zhang M, Zhang D (2011) Genome-wide case-control study in GAW17 using coalesced rare variants. BMC Proc 5(Suppl 9):S110
Zhang M, Lin Y, Wang L, Pungpapong V, Fleet JC, Zhang D (2009) Case-control genome-wide association studies of rheumatoid arthritis from GAW16 using POCRE-LDA. BMC Proc Suppl 7:S17
Acknowledgments
This research was partially supported by NIH (R03CA211831, P30CA015704-40) and NSF CAREER award IIS-0844945. The authors gratefully acknowledge the support of the Cancer Care Engineering (CCE) project, a joint effort between the Oncological Sciences Center (Purdue Center for Cancer Research, NCI P30CA23168) in Purdue University Discovery Park and the Indiana University Melvin and Bren Simon Cancer Center (NCI P30CA082709).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Zhang, D., Zhang, M. (2019). Advanced Statistical Methods for NMR-Based Metabolomics. In: Gowda, G., Raftery, D. (eds) NMR-Based Metabolomics. Methods in Molecular Biology, vol 2037. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9690-2_26
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9690-2_26
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9689-6
Online ISBN: 978-1-4939-9690-2
eBook Packages: Springer Protocols