Abstract
In recent years, mass spectrometry techniques have helped proteomics to become a powerful tool for the early diagnosis of cancer, as they help to discover protein profiles specific to each pathological state. One of the questions where proteomics is giving useful practical results is that of classifying patients into one of the possible severity levels of an illness, based on some features measured on the patient. This classification is usually made using one of the many discrimination procedures available in statistical literature. We present in this chapter recently developed restricted discriminant rules that use additional information in terms of orderings on the means, and we illustrate how to apply them to mass spectrometry data using R package dawai. Specifically, we use proteomic prostate cancer data, and we describe all steps needed, including data preprocessing and feature extraction, to build a discriminant rule that classifies samples in one of several disease stages, thus helping diagnosis. The restricted discriminant rules are compared with some standard classifiers that do not take into account the additional information, showing better performance in terms of error rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Toss A, DeMatteis E, Rossi E et al (2013) Ovarian cancer: can proteomics give new insights for therapy and diagnosis? Int J Mol Sci 14:8271–8290
Yasui Y, Pepe M, Thompson ML et al (2003) A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4:449–463
Paul D, Kumar A, Gajbhiye A et al (2013) Mass spectrometry-based proteomics in molecular diagnostics: discovery of cancer biomarkers using tissue culture. BioMed Res Int 2013, Article ID 783131
Khadir A, Tiss A (2013) Proteomics approaches towards early detection and diagnosis of cancer. J Carcinog Mutagen S14:002
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Buntime W (1992) Learning classification trees. Stat Comput 2:63–72
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Breiman L (2001) Random forests. Mach Learn 45:5–32
Fernandez M, Rueda C, Salvador B (2006) Incorporating additional information to normal linear discriminant rules. J Am Stat Assoc 101:569–577
Conde D, Fernandez MA, Rueda C et al (2012) Classification of samples into two or more ordered populations with application to a cancer trial. Stat Med 31:3773–3786
Conde D, Salvador B, Rueda C et al (2013) Performance and estimation of the true error rate of classification rules built with additional information. An application to a cancer trial. Stat Appl Genet Mol Biol 12:583–602
Conde D, Fernandez MA, Salvador B et al (2014) dawai: Discriminant analysis with additional information. http://cran.r-project.org/package=dawai
Petricoin EF, Ornstein DK, Paweletz CP et al (2002) Serum proteomic patterns for detection of prostate cancer. J Natl Cancer Inst 94:1576–1578
Semmes OJ, Feng Z, Adam B-L et al (2005) Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. Clin Chem 51:102–112
Wagner M, Naik D, Pothen A (2003) Protocols for disease classification from mass spectrometry data. Proteomics 3:1692–1698
Zhu W, Wang X, Ma Y et al (2003) Detection of cancer-specific markers amid massive mass spectral data. Proc Natl Acad Sci U S A 100:14666–14671
Baggerly KA, Morris JS, Wang J et al (2003) A comprehensive approach to the analysis of matrix assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3:1667–1672
Bhattacharyya S, Siegel ER, Petersen GM et al (2004) Diagnosis of pancreatic cancer using serum proteomic profiling. Neoplasia 6:674–686
Li J, Zhang Z, Rosenzweig J et al (2002) Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 48:1296–1304
Alfassi ZB (2004) On the normalization of a mass spectrum for comparison of two spectra. Journal Am Soc Mass Spectrom 15:385–387
Petricoin EF, Ardekani AM, Hitt BA et al (2002) Use of proteomic patters in serum to identify ovarian cancer. Lancet 359:572–577
Meuleman W, Engwegen JYMN, Gast M-CW et al (2008) Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data. BMC Bioinformatics 9:88
Bhanot G, Alexe G, Venkataraghavan B et al (2006) A robust meta-classification strategy for cancer detection from MS data. Proteomics 6:592–604
Tibshirani R, Hastie T, Narasimhan B et al (2004) Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20:3034–3044
Wang MZ, Howard B, Campa MJ et al (2003) Analysis of human serum proteins by liquid phase isoelectric focusing and matrix-assisted laser desorption/ionization-mass spectrometry. Proteomics 3:1661–1666
Taskin V, Dogan B, Olmez T (2013) Prostate cancer classification from mass spectrometry data by using wavelet analysis and Kernel Partial Least Squares Algorithm. Int J Biosci Biochem Bioinforma 3:98–102
Malyarenko DI, Cooke WE, Adam B-L et al (2005) Enhancement of sensitivity and resolution of surface-enhanced laser desorption/ionization time-of-flight mass spectrometric records for serum peptides using time-series analysis techniques. Clin Chem 51:65–74
Liu Q, Krishnapuram B, Pratapa P et al (2004) Identification of differentially expressed proteins using MALDI-TOF mass spectra. Conf Rec Asilomar Conf Signals Syst Comput 2:1323–1327
Morris JS, Coombes KR, Koomen J et al (2005) Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21:1764–1775
van Eeden C (2006) Restricted parameter space estimation problems: admissibility and minimaxity properties. Springer, New York
Canty A, Ripley B (2014) boot: bootstrap functions (originally by Angelo Canty for S). http://cran.r-project.org/package=boot
Sinnwell JP, Schaid DJ (2013) ibdreg: regression methods for IBD linkage with covariates. http://cran.r-project.org/package=ibdreg
Genz A, Bretz F, Miwa T et al (2014) mvtnorm: multivariate normal and t distributions. http://cran.r-project.org/package=mvtnorm
Ripley B, Venables B, Bates DM et al (2011) Support functions and datasets for venables and Ripley’s MASS. http://cran.r-project.org/package=MASS
Breiman L, Cutler A, Liaw A et al (2014) randomForest: Breiman and Cutler’s random forests for classification and regression. http://cran.r-project.org/package=randomForest
Meyer D, Dimitriadou E, Hornik K et al (2014) e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. http://cran.r-project.org/package=e1071
Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 12:49–55
Salvador B, Fernandez MA, Martin I et al (2008) Robustness of classification rules that incorporate additional information. Comput Stat Data An 52:2489–2495
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this protocol
Cite this protocol
Conde, D., Fernández, M.A., Salvador, B., Rueda, C. (2016). Classification of Samples with Order-Restricted Discriminant Rules. In: Jung, K. (eds) Statistical Analysis in Proteomics. Methods in Molecular Biology, vol 1362. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3106-4_10
Download citation
DOI: https://doi.org/10.1007/978-1-4939-3106-4_10
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3105-7
Online ISBN: 978-1-4939-3106-4
eBook Packages: Springer Protocols