Skip to main content

GMM-Based Molecular Serum Profiling Framework

  • Conference paper
  • First Online:
Book cover Information and Software Technologies (ICIST 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 538))

Included in the following conference series:

  • 985 Accesses

Abstract

The paper presents GMM-based molecular serum profiling framework dedicated to complete analyzing of Maldi-ToF mass spectrometry data. The presented Matlab-based framework is a comprehensive, self-adapting solution dedicated to different kind of spectra datasets. The process of mass spectrometry data analysis consists of several procedures, like data preparation, data pre-processing including baseline correction, detection of outliers and noise removal. The mean spectrum is calculated, modeled with GMM and decomposed using the Expectation-Maximization algorithm. In this process localization of the mean spectrum peaks is done with the dedicated adaptive procedure. Results of the mean spectrum decomposition in the subsequent step are applied into each single spectrum in the dataset in the form of Gaussian mask. The result is a data set ready for further statistical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Baggerly, K.A., Morris, J., Wang, J., Gold, D., Xiao, L.C., Coombes, K.R.: A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight proteomics spectra from serum samples. Proteomics 1667–1672 (2003)

    Article  Google Scholar 

  2. Barnhill, S., Vapnik, V., Guyon, I., Weston, J.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)

    Article  Google Scholar 

  3. Boster, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory, pp. 114–152 (1992)

    Google Scholar 

  4. Clyde, M.A., House, L.L., Wolpert, R.L. Nonparametric models for proteomic peak identification and quantification. ISDS Discussion Paper, 2006–2007 (2006)

    Google Scholar 

  5. Coombes, K., Baggerly, K., Morris, J.: Pre-processing mass spectrometry data. In: Dubitzky, W., et al. (eds.) Fundamentals of Data Mining in Genomics and Proteomics, pp. 79–99. Kluwer, New York (2007)

    Chapter  Google Scholar 

  6. Coombes, K.R., Koomen, J.M., Baggerly, K.A., et al.: Understanding the characteristics of mass spectrometry data through the use of simulation. Cancer Inform. 1, 41–52 (2005)

    Article  Google Scholar 

  7. Comon, P.: Independent component analysis – new concept? Sig. Proc. 36, 287–314 (1994)

    Article  Google Scholar 

  8. Fung, E.T., Enderwick, C.: Proteinchip clinical proteomics: computational challenges and solutions. Biotechniques 32(Suppl 1), 34–41 (2002)

    Article  Google Scholar 

  9. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  10. Dijkstra, M., Roelofsen, H., Vonk, R., Jansen, R.: Peak quantification in surface-enhanced laser desorption/ionization by using mixture models. Proteomics 6, 5106–5116 (2006)

    Article  Google Scholar 

  11. Du, P., Kibbe, W., Lin, S.: Improved peak detection in mass spectrum by incorporating continuos wavelet transform-based pattern matching. Genome Anal. 22, 2059–2065 (2006)

    Google Scholar 

  12. Gentzel, M., Kocher, T., Ponnusamy, S., Wilm, M.: Preprocessing of tandem mass spectrometric data to support automatic protein identyfication. Proteomics 3, 1597–1610 (2003)

    Article  Google Scholar 

  13. Gyaourova, A., Kamath, C., Fodor, I.K.: Undecimated wavelet transforms for image de-noising. Technical Report UCRL-ID-150931, Lawrence Livermore National Laboratory, Livermore, CA (2002)

    Google Scholar 

  14. Hubert, M., Van der Veeken, S.: Outlier detection for skewed data. J. Chemometrics 22, 235–246 (2008)

    Article  Google Scholar 

  15. Jutten, C., Herault, J.: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Sig. Process. 24, 1–10 (1991)

    Article  Google Scholar 

  16. Kempka, M., Sjodahl, J., Bjork, A., Roeraade, J.: Improved method for peak picking in matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom. 18, 1208–1212 (2004)

    Article  Google Scholar 

  17. Koziel, G.: Fourier transform based methods in sound steganography. Actual Probl. Econ. 6(120), 321–328 (2011)

    Google Scholar 

  18. Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., Well Jr, R.O.: Noise reduction using an undecimated discrete wavelet transform. IEEE Sig. Process. Lett. 3, 10–12 (1996)

    Article  Google Scholar 

  19. Mantini, D., Petrucci, F., Del Boccio, P., et al.: Independent component analysis for the extraction of reliable protein signal profiles from Maldi-ToF mass spectra. Bioinformatics 24, 63–70 (2008)

    Article  Google Scholar 

  20. Miłosz, M.: Performance testing of new enterprise applications using legacy load data: a HIS case study. In: ICEIS 2013 - 15th International Conference on Enterprise Information Systems, pp. 269–274 (2013)

    Google Scholar 

  21. Morris, J., Coombes, K., Kooman, J., Baggerly, K., Kobayashi, R.: Feature extraction and quantification for mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)

    Article  Google Scholar 

  22. Pietrowska, M., Marczak, L., Polanska, J., Behrendt, K., Nowicka, E., Walaszczyk, A., Widlak, P.: Mass spectrometry-based serum proteome pattern analysis in molecular diagnostics of early stage breast cancer. J. Transl. Med. 7(60.10), 1186 (2009)

    Google Scholar 

  23. Polanska, J., Plechawska, M., Pietrowska, M., Marczak, L.: Gaussian mixture decomposition in the analysis of MALDI-TOF spectra. Expert Syst. 29(3), 216–231 (2012)

    Article  Google Scholar 

  24. Plechawska, M., Polanska, J.: Simulation of the usage of Gaussian mixture models for the purpose of modelling virtual mass spectrometry data. In: MIE, pp. 804–808 (2009)

    Google Scholar 

  25. Plechawska, M., Polańska, J., Polański, A., Pietrowska, M., Tarnawski, R., Widlak, P., Stobiecki, M., Marczak, Ł.: Analyze of Maldi-TOF proteomic spectra with usage of mixture of gaussian distributions. In: Cyran, K.A., Kozielski, S., Peters, J.F., Stańczyk, U., Wakulicz-Deja, A. (eds.) Man-Machine Interactions. AISC, vol. 59, pp. 113–120. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  26. Randolph, T., et al.: Quantifying peptide signal in MALDI-TOF mass spectrometry data. Mol. Cell. Proteomics MCP 4(12), 1990–1999 (2005)

    Article  Google Scholar 

  27. Tibshirani, R., Hastiey, T., Narasimhanz, B., Soltys, S., Shi, G., Koong, A., Le, Q.T.: Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20, 3034–3044 (2004)

    Article  Google Scholar 

  28. Tversky, A., Hutchinson, J.W.: Nearest neighbor analysis of psychological spaces. Psychol. Rev. 93(1), 3–22 (1993)

    Article  Google Scholar 

  29. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  Google Scholar 

  30. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  31. Windham, M.P., Cutler, A.: Information ratios for validating cluster analyses. J. Am. Stat. Assoc. 87, 1188–1192 (1993)

    Article  Google Scholar 

  32. Wold, H.: Estimation of principal components and related models by iterative least squares. Multivar. Anal. 391–420 (1966)

    Google Scholar 

  33. Yasui, Y., Pepe, M., Thompson, M.L., Adam, B.L., Wright, G.L., Qu, Y., Potter, J.D., Winget, M., Thornquist, M., Feng, Z.: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4(3), 449–463 (2003)

    Article  Google Scholar 

  34. Zhang S.Q., et al.: Peak detection with chemical noise removal using Short-Time FFT for a kind of MALDI Data. In: Proceedings of OSB 2007, Lecture Notes in Operations Research, vol. 7, pp. 222–231 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Małgorzata Plechawska-Wójcik .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Plechawska-Wójcik, M. (2015). GMM-Based Molecular Serum Profiling Framework. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2015. Communications in Computer and Information Science, vol 538. Springer, Cham. https://doi.org/10.1007/978-3-319-24770-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24770-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24769-4

  • Online ISBN: 978-3-319-24770-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics