Abstract
Metabolomics is the relatively new field in bioinformatics that uses measurements on metabolite abundance as a tool for disease diagnosis and other medical purposes. Although closely related to proteomics, the statistical analysis is potentially simpler since biochemists have significantly more domain knowledge about metabolites. This chapter reviews the challenges that metabolomics poses in the areas of quality control, statistical metrology, and data mining.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rozen, S., Cudkowicz, M., Bogdanov, M., Matson, W., Kristal, B., Beecher, C., Harrison, S., Vouros, P., Flarakos, J., Vigneau-Callahan, K., Matson, T., Newhall, K., Beal, M. F., Brown, R. H. Jr., and Kaddurah-Daouk, R. (2005) Metabolomic analyiss and signtures in motor neuron disease. Metabolomics, 1, 101–108.
Kenny, L., Dunn, W., Ellis, D., Myers, J., Baker, P., the GOPEC Consortium, and Kell, D. (2005) Novel biomarkers for pre-eclampsia detected using metabolomics and machine learning. Metabolomics, 1, 227–234.
Murthy, A., Rajendiran, T., Poisson, L., Siddiqui, J., Lonigro, R., Alexander, D., Shuster, J., Beecher, C., Wei, J., Chinnaiya, A., and Sreekumar, A. (2010) An alternative screening tool for prostate adenocarcinoma: Biomarker discovery. MURJ, 19, 71–79.
Romero, R., Mazaki-Tovi, S., Vaisbuch, E., Kusanovic, J., Nien, J., Yoon, B., Mazor, M., Luo, J., Banks, D., Ryals, J., and Beecher, C. (2010) Metabolomics in premature labor: A novel approach to identify patients at risk for preterm delivery. Journal of Maternal-Fetal and Neonatal Medicine, 23, 1344–1359.
Wishart, D. (2008) Metabolomics: Applications to food science and nutrition research. Trends in Food Science and Technology, 19, 482–493.
Romero, P., Wagg, J., Green, M., Kaiser, D., Krummenacker, M., and Karp, P. (2004) Computational prediction of human metabolic pathways from the complete human genome. Genome Biology, 6, R1–R17.
Dunn, W., and Ellis, D. (2005) Metabolomics: Current analytical platforms and methodologies. Trends in Analytical Chemistry, 24, 285–294.
Broadhurst, D., and Kell, D. (2007) Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics, 2, 171–196.
Baggerley, K., Morris, J., and Coombes, K. (2004). Reproducibility of SELD-TOF protein patterns in serum: Comparing datasets from different experiments. Bioinformatics, 20, 777–785.
Kempthorne, O. (1952) Design and Analysis of Experiments, John Wiley & Sons, New York, N.Y.
Bose, R., and Shimamoto, T. (1952) Classification and analysis of partially balanced incomplete block designs with two associate classes. Journal of the American Statistical Association, 47, 151–184.
Montgomery, D. (1991) Statistical Quality Control, Wiley, New York, N.Y.
Benjamini, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300.
Liu, R. (1995). Control charts for multivariate processes. Journal of the American Statistical Association, 90, 1380–1387.
Wang, K., and Gasser, T. (1997). Alignment of curves by dynamic time warping. Annals of Statistics, 25, 1251–1276.
Katajamaa, M., and Orešič, M. (2007) Data processing for mass spectrometry-based metabolomics. Journal of Chromatography A, 1158, 318–328.
Xi, Y., and Rocke, D. (2008) Baseline correction for NMR spectroscopic metabolomics data analysis. BMC Bioinformatics, 9, 1–10, doi:10.1186/1471-2105-9-324.
Morrison, D. (1990). Multivariate Statistical Methods, McGraw-Hill, New York, N.Y.
Martello, S., and Toth, P. (1990) Knapsack Problems: Algorithms and Computer Implementation, John Wiley & Sons, New York, N.Y.
Gilks, W., Richardson, S., and Spiegelhalter, D. (1996) Markov Chain Monte Carlo in Practice, Chapman & Hall/CRC, Boca Raton, FL.
Vidakovic, B. (1999) Statistical Modeling by Wavelets, Wiley, New York, N.Y.
Cameron, J. (1982) Error analysis. Encyclopedia of Statistical Sciences, vol. 2, 545–551, Wiley, New York, N.Y.
Searle, S., Casella, G., and McCulloch, C. (1992) Variance Components, Wiley, New York, N.Y.
Casella, G., and Berger, R. (1990) Statistical Inference, Duxbury Press, Belmont, CA.
Steele, A., Hill, K., and Douglas, R. (2002). Data pooling and key comparison reference values. Metrologia, 39, 269–277.
Milliken, G. A. and Johnson, D. E. (2000) The Analysis of Messy Data, vol. II. Wiley.
Clarke, B., Fokoué, E., and Zhang, H. (2009). Principles and Theory for Data Mining and Machine Learning, Springer, New York, N.Y.
Hastie, T., Tibshirani, R., and Friedman, J. (2009) The Elements of Statistical Learning, Springer, New York, N.Y.
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Eugenics, 7, 179–188.
Raudys, S. and Young, D. (2004) Results in statistical discriminant analysis: A review of the former Soviet Union literature.” Journal of Multivariate Analysis, 89, 1–35.
Weisberg, S. (1980) Applied Linear Regression, Wiley, New York, N.Y.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, B, 58, 267–288.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, B, 67, 301–320.
Candes, E., and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 35, 2313–2351.
Vapnik, V. (1996) The Nature of Statistical Learning. Springer, New York, N.Y.
Cortes, C., and Vapnik, V. (1995), “Support-vector networks,” Machine Learning, 20, 273–297.
Boser, B., Guyon, I., and Vapnik, V. (1992) A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, D. Haussler, ed., pp. 144–152. ACM Press, Pittsburgh, PA.
Aizerman, M., Braverman, E., and Rozonoer, L. (1964) Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821–837.
Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984) Classification and Regression Trees. Wadsworth/Brooks Cole, Belmont, CA.
Hawkins, D., Kass, G. (1982). Chapter 5: Automatic interaction detection. In Topics in Applied Multivariate Analysis, D. Hawkins, ed., pp. 269–302. Cambridge University Press, Cambridge, U.K.
Quinlan, J. R. (1992). C4.5 Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA.
Efron, B., and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton, FL.
Simmons, K., Kinney, J., Owens, A., Kleier, D., Bloch, K., Argentar, D., Walsh, A., and Vaidyanathan, G. (2008). Comparative study of machine learning and chemometric tools for analysis of in-vivo high-throughput screening data. Journal of Chemical Information and Modeling, 48, 1663–1668.
Truong, Y., Lin, X., Beecher, C., Cutler, A. and Young, S. (2004) Learning a complex dataset using random forests and support vector machines. Proceedings fo the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 835–840.
Bradley, P., and Mangasarian, O. (1998) Feature selection via concave minimization and support vector machines. International Conference on Machine Learning 15, 82–90.
Fan, J., and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Wegman, E. (1990) Hyperdimensional data analysis using parallel coordinates. Journal of the American Statistical Association, 85, 664–675.
Liu, L., Hawkins, D., Ghosh, S., and Young, S. (2003) Robust singular value decomposition analysis of microarray data. Proceedings of the National Academy of Sciences of the United States of America, 100, 13167–13172.
Stone, M. (1977) Asymptotics for and against cross-validation. Biometrika, 64, 29–35.
Ivahkenko, A. G. (1970). Heuristic self-organization in problems of engineering cybernetics. Automatica, 6, 207–219.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Korman, A., Oh, A., Raskind, A., Banks, D. (2012). Statistical Methods in Metabolomics. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 856. Humana Press. https://doi.org/10.1007/978-1-61779-585-5_16
Download citation
DOI: https://doi.org/10.1007/978-1-61779-585-5_16
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-584-8
Online ISBN: 978-1-61779-585-5
eBook Packages: Springer Protocols