Abstract
In this paper we propose to use Gaussian mixture decompositions of protein mass spectral signals to construct least squares estimators of peptide species concentrations in proteomic samples and further to use these estimators as spectral features in cancer versus normal spectral classifiers. For a real dataset we compare variances of least squares estimators to variances of analogous estimators based on spectral peaks. We also evaluate performance of spectral classifiers with features defined by either least squares estimators or by spectral peaks by their power to differentiate between patterns specific for case and control samples of head and neck cancer patients. Cancer/normal classifiers based on spectral features defined by Gaussian components achieved lower average error rates than classifiers based on spectral peaks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barla A, Jurman G, Riccadonna S, Merler S, Chierici M, Furlanello C (2008) Machine learning methods for predictive proteomics. Brief Bioinform. doi:10.1093/bib/bbn008
Baggerly KA, Morris JS, Wang J, Gold D, Xiao LC, Coombes KR (2003) A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3:1667–1672
Baggerly KA, Morris JS, Coombes KR (2004) Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20:777–785
Bao-Ling A, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH, Semmes OJ, Schellhammer PF, Yasui Y, Feng Z, Wright GL Jr (2002) Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res 62:3609–3614
Deutsch R (1965) Estimation theory. Prentice Hall, New York
Dijkstra M, Roelofsen H, Vonk RJ, Jansen RC (2006) Peak quantification in surface-enhanced laser desorption/ionization by using mixture models. Proteomics 6(19):5106–5116
Du P, Kibbe WA, Lin SM (2006) Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics 22(17):2059–2065
Hale JE, Gelfanova V, Ludwig JR, Knierman MD (2003) Application of proteomics for discovery of protein biomarkers. Brief Funct Genomics Proteomics 2:185–193
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin
Karpievitch YV, Hill EG, Smolka AJ, Morris JS, Coombes KR, Baggerly KA, Almeida JS (2007) PrepMS: TOF MS data graphical preprocessing tool. Bioinformatics 23(2):264–265
Kempka M, Sjodahl J, Bjork A, Roeraade J (2004) Improved method for peak picking in matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Commun Mass Spectrom 18:1208–1212
Levene H (1960) In: Olkin I, Hotelling H et al. (eds) Contributions to probability and statistics: essays in honor of Harold hotelling. Stanford University Press, Stanford, pp 278–292
McLachan GJ, Peel W (2000) Finite mixture distributions. Wiley, New York
Noy K, Fasulo D (2007) Improved model-based, platform-independent feature extraction for mass spectrometry. Bioinformatics 23(19):2528–2535
Pelikan R, Hauskrecht M (2010) Efficient peak-labeling algorithms for whole-sample mass spectrometry proteomics. IEEE/ACM Trans Comput Biol Bioinform 7(1):126–137
Pietrowska M, Polanska J, Walaszczyk A, Wygoda A, Rutkowski T, Skladowski K, Marczak L, Stobiecki M, Marczyk M, Polanski A, Widlak P (2011) Association between plasma proteome profiles analysed by mass spectrometry, a lymphocyte-based DNA-break repair assay and radiotherapy-induced acute mucosal reaction in head and neck cancer patients. Int J Radiat Biol 87(7):711–719
Ressom HW, Varghese RS, Drake SK, Hortin GL, Abdel-Hamid M, Loffredo CA, Goldman R (2007) Peak selection from MALDI-TOF mass spectra using ant colony optimization. Bioinformatics 23:619–626
Sauve AC, Speed TP (2004) Normalization, baseline correction and alignment of high-throughput mass spectrometry data. In: Proceedings gensips
Sokol R, Polanski A (2013) Comparison of methods for initializing EM algorithm for estimation of parameters of Gaussian multi component heteroscedastic mixture models. Studia Inform 34(1):1–25
Wang Y, Zhou X, Wang H, Li K, Yao L, Wong ST (2008) Reversible jump MCMC approach for peak identification for stroke SELDI mass spectrometry using mixture model. Bioinformatics 24(13):407–413
Yang C, He Z, Yu W (2009) Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis. BMC Bioinformatics 10:4
Acknowledgements
This work was financially supported by the Polish National Science Centre UMO-2011/01/B/ST6/06868 grant (A.P.), GeCONiI project number POIG.02.03.01-24-099/13 (M.M.) and internal grant from Silesian University of Technology BK/265/RAU-1/2014 t.10 (J.P.). All the calculations were carried out using GeCONiI infrastructure funded by project number POIG.02.03.01-24-099/13.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Polanski, A., Marczyk, M., Pietrowska, M., Widlak, P., Polanska, J. (2015). Least Squares Estimators of Peptide Species Concentrations Based on Gaussian Mixture Decompositions of Protein Mass Spectra. In: Steland, A., Rafajłowicz, E., Szajowski, K. (eds) Stochastic Models, Statistics and Their Applications. Springer Proceedings in Mathematics & Statistics, vol 122. Springer, Cham. https://doi.org/10.1007/978-3-319-13881-7_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-13881-7_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13880-0
Online ISBN: 978-3-319-13881-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)