Abstract
The paper presents GMM-based molecular serum profiling framework dedicated to complete analyzing of Maldi-ToF mass spectrometry data. The presented Matlab-based framework is a comprehensive, self-adapting solution dedicated to different kind of spectra datasets. The process of mass spectrometry data analysis consists of several procedures, like data preparation, data pre-processing including baseline correction, detection of outliers and noise removal. The mean spectrum is calculated, modeled with GMM and decomposed using the Expectation-Maximization algorithm. In this process localization of the mean spectrum peaks is done with the dedicated adaptive procedure. Results of the mean spectrum decomposition in the subsequent step are applied into each single spectrum in the dataset in the form of Gaussian mask. The result is a data set ready for further statistical analysis.
References
Baggerly, K.A., Morris, J., Wang, J., Gold, D., Xiao, L.C., Coombes, K.R.: A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight proteomics spectra from serum samples. Proteomics 1667–1672 (2003)
Barnhill, S., Vapnik, V., Guyon, I., Weston, J.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Boster, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory, pp. 114–152 (1992)
Clyde, M.A., House, L.L., Wolpert, R.L. Nonparametric models for proteomic peak identification and quantification. ISDS Discussion Paper, 2006–2007 (2006)
Coombes, K., Baggerly, K., Morris, J.: Pre-processing mass spectrometry data. In: Dubitzky, W., et al. (eds.) Fundamentals of Data Mining in Genomics and Proteomics, pp. 79–99. Kluwer, New York (2007)
Coombes, K.R., Koomen, J.M., Baggerly, K.A., et al.: Understanding the characteristics of mass spectrometry data through the use of simulation. Cancer Inform. 1, 41–52 (2005)
Comon, P.: Independent component analysis – new concept? Sig. Proc. 36, 287–314 (1994)
Fung, E.T., Enderwick, C.: Proteinchip clinical proteomics: computational challenges and solutions. Biotechniques 32(Suppl 1), 34–41 (2002)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)
Dijkstra, M., Roelofsen, H., Vonk, R., Jansen, R.: Peak quantification in surface-enhanced laser desorption/ionization by using mixture models. Proteomics 6, 5106–5116 (2006)
Du, P., Kibbe, W., Lin, S.: Improved peak detection in mass spectrum by incorporating continuos wavelet transform-based pattern matching. Genome Anal. 22, 2059–2065 (2006)
Gentzel, M., Kocher, T., Ponnusamy, S., Wilm, M.: Preprocessing of tandem mass spectrometric data to support automatic protein identyfication. Proteomics 3, 1597–1610 (2003)
Gyaourova, A., Kamath, C., Fodor, I.K.: Undecimated wavelet transforms for image de-noising. Technical Report UCRL-ID-150931, Lawrence Livermore National Laboratory, Livermore, CA (2002)
Hubert, M., Van der Veeken, S.: Outlier detection for skewed data. J. Chemometrics 22, 235–246 (2008)
Jutten, C., Herault, J.: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Sig. Process. 24, 1–10 (1991)
Kempka, M., Sjodahl, J., Bjork, A., Roeraade, J.: Improved method for peak picking in matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom. 18, 1208–1212 (2004)
Koziel, G.: Fourier transform based methods in sound steganography. Actual Probl. Econ. 6(120), 321–328 (2011)
Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., Well Jr, R.O.: Noise reduction using an undecimated discrete wavelet transform. IEEE Sig. Process. Lett. 3, 10–12 (1996)
Mantini, D., Petrucci, F., Del Boccio, P., et al.: Independent component analysis for the extraction of reliable protein signal profiles from Maldi-ToF mass spectra. Bioinformatics 24, 63–70 (2008)
Miłosz, M.: Performance testing of new enterprise applications using legacy load data: a HIS case study. In: ICEIS 2013 - 15th International Conference on Enterprise Information Systems, pp. 269–274 (2013)
Morris, J., Coombes, K., Kooman, J., Baggerly, K., Kobayashi, R.: Feature extraction and quantification for mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)
Pietrowska, M., Marczak, L., Polanska, J., Behrendt, K., Nowicka, E., Walaszczyk, A., Widlak, P.: Mass spectrometry-based serum proteome pattern analysis in molecular diagnostics of early stage breast cancer. J. Transl. Med. 7(60.10), 1186 (2009)
Polanska, J., Plechawska, M., Pietrowska, M., Marczak, L.: Gaussian mixture decomposition in the analysis of MALDI-TOF spectra. Expert Syst. 29(3), 216–231 (2012)
Plechawska, M., Polanska, J.: Simulation of the usage of Gaussian mixture models for the purpose of modelling virtual mass spectrometry data. In: MIE, pp. 804–808 (2009)
Plechawska, M., Polańska, J., Polański, A., Pietrowska, M., Tarnawski, R., Widlak, P., Stobiecki, M., Marczak, Ł.: Analyze of Maldi-TOF proteomic spectra with usage of mixture of gaussian distributions. In: Cyran, K.A., Kozielski, S., Peters, J.F., Stańczyk, U., Wakulicz-Deja, A. (eds.) Man-Machine Interactions. AISC, vol. 59, pp. 113–120. Springer, Heidelberg (2009)
Randolph, T., et al.: Quantifying peptide signal in MALDI-TOF mass spectrometry data. Mol. Cell. Proteomics MCP 4(12), 1990–1999 (2005)
Tibshirani, R., Hastiey, T., Narasimhanz, B., Soltys, S., Shi, G., Koong, A., Le, Q.T.: Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20, 3034–3044 (2004)
Tversky, A., Hutchinson, J.W.: Nearest neighbor analysis of psychological spaces. Psychol. Rev. 93(1), 3–22 (1993)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Windham, M.P., Cutler, A.: Information ratios for validating cluster analyses. J. Am. Stat. Assoc. 87, 1188–1192 (1993)
Wold, H.: Estimation of principal components and related models by iterative least squares. Multivar. Anal. 391–420 (1966)
Yasui, Y., Pepe, M., Thompson, M.L., Adam, B.L., Wright, G.L., Qu, Y., Potter, J.D., Winget, M., Thornquist, M., Feng, Z.: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4(3), 449–463 (2003)
Zhang S.Q., et al.: Peak detection with chemical noise removal using Short-Time FFT for a kind of MALDI Data. In: Proceedings of OSB 2007, Lecture Notes in Operations Research, vol. 7, pp. 222–231 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Plechawska-Wójcik, M. (2015). GMM-Based Molecular Serum Profiling Framework. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2015. Communications in Computer and Information Science, vol 538. Springer, Cham. https://doi.org/10.1007/978-3-319-24770-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-24770-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24769-4
Online ISBN: 978-3-319-24770-0
eBook Packages: Computer ScienceComputer Science (R0)