Skip to main content

Study on Preprocessing and Classifying Mass Spectral Raw Data Concerning Human Normal and Disease Cases

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4345))

Abstract

Mass spectrometry is becoming an important tool in biological sciences. Tissue samples or easily obtained biological fluids (serum, plasma, urine) are analysed by a variety of mass spectrometry methods, producing spectra characterized by very high dimensionality and a high level of noise. Here we address a feature exraction method for mass spectra which consists of two main steps : In the first step an algorithm for low level preprocessing of mass spectra is applied, including denoising with the Shift-Invariant Discrete Wavelet Transform (SIDWT), smoothing, baseline correction, peak detection and normalization of the resulting peak-lists. After this step, we claim to have reduced dimensionality and redundancy of the initial mass spectra representation while keeping all the meaningful features (potential biomarkers) required for disease related proteomic patterns to be identified. In the second step, the peak-lists are alligned and fed to a Support Vector Machine (SVM) which classifies the mass spectra. This procedure was applied to SELDI-QqTOF spectral data collected from normal and ovarian cancer serum samples. The classification performance was assessed for distinct values of the parameters involved in the feature extraction pipeline. The method described here for low-level preprocessing of mass spectra results in 98.3% sensitivity, 98.3% specificity and an AUC (Area Under Curve) of 0.981 in spectra classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Coombes, K.R., Tsavachidis, S., Morris, J.S., Baggerly, K.A., Hung, M.-C., Kuerer, H.M.: Improved Peak Detection and Quantification of Mass Spectrometry Data Acquired from Surface-Enhanced Laser Desorption and Ionization by Denoising Spectra with the Undecimated Discrete Wavelet Transform. Proteomics 5(16), 4107–4117 (2005)

    Article  Google Scholar 

  2. Kalousis, A., Prados, J., Rexhepaj, E., Hilario, M.: Feature extraction from mass spectral data for the classification of pathological states. In: Principles of Data Mining and Knowledge Discoverty, Ninth European Conference. Springer, Heidelberg (2005)

    Google Scholar 

  3. Wolski, W.E., Lalowski, M., Martus, P., Herwig, R., Giavalisco, P., Gobom, J., Sickmann, A., Lehrach, H., Reinert, K.: Transformation and other factors of the peptide mass spectrometry pairwise peak-list comparison process. BMC Bioinformatics 6, 285 (2005)

    Article  Google Scholar 

  4. Zhang, X., Lu, X., Shi, Q., Xu, X.Q., Leung, H.C., Harris, L.N., Iglehart, J.D., Miron, A., Liu, J.S., Wong, W.H.: Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics. 7, 197 (2006)

    Article  Google Scholar 

  5. Wagner, M., Naik, D., Pothen, A.: Protocols for disease classification from mass spectrometry data. Proteomics 3(9), 1692–1698 (2003)

    Article  Google Scholar 

  6. Qu, Y., Adam, B.I., Thornquist, M., Potter, J.D., Thompson, M.L., Yasui, Y., Davis, J., Schellhammer, P.F., Cazares, L., Clements, M., Wright, G.L., Feng, Z.: Data reduction using a discrete wavelet transform in discriminant analysis of very high dimensional data. Biometrics 59, 143–151 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Lee, K.R., Lin, X., Park, D., Eslava, S.: Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method. Proteomics 3 (2003)

    Google Scholar 

  8. Conrads, T.P., Fusaro, V.A., Ross, S., Johann, D., Rajapakse, V., Hitt, B.A., Steinberg, S.M., Kohn, E.C., Fishman, D.A., Whitely, G., Barrett, J.C., Liotta, L.A., Petricoin III, E.F., Veenstra, T.D.: High-resolution serum proteomic features for ovarian cancer detection. Endocrine-Related Cancer 11, 163–178 (2004)

    Article  Google Scholar 

  9. Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., Wells Jr., R.O.: Nonlinear processing of a shift invariant DWT for noise reduction. In: Mathematical Imaging: Wavelet Applications for Dual Use, SPIE Proceedings, Orlando FL, vol. 2491 (1995)

    Google Scholar 

  10. Lang, M., Guo, H., Odegard, J.E., Burrus, C.S., Wells Jr., R.O.: Noise Reduction Using an Undecimated Discrete Wavelet Transform. IEEE Signal Processing Letters 3, 10–12 (1996)

    Article  Google Scholar 

  11. Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41(3), 613–627 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  12. Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrica 81, 425–455 (1994), Also Tech.Report 400, Department of Statistics, Stanford University (July 1992)

    Article  MATH  MathSciNet  Google Scholar 

  13. Beylkin, G.: On the representation of operators in bases of compactly supported wavelets. SIAM J. Numer. Anal. 29(6), 1716–1740 (1996)

    Article  MathSciNet  Google Scholar 

  14. Andrade, L., Manolakos, E.: Signal Background Estimation and Baseline Correction Algorithms for Accurate DNA Sequencing. Journal of VLSI, special issue on Bioinformatics 35(3), 229–243 (2003)

    Google Scholar 

  15. Alfassi Zeen, B.: On the normalization of a mass spectrum for comparison of two spectra (2004)

    Google Scholar 

  16. Huang, J., Ling, C.X.: Using AUC and Accuracy in Evaluating Learing Algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3), 299–310 (2005)

    Article  Google Scholar 

  17. Ovarian Cancer DataSet, http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp

  18. Rice Wavelet Toolbox Licence, http://www.dsp.rice.edu/software/RWT/LICENSE

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Floros, X.E., Spyrou, G.M., Vougas, K.N., Tsangaris, G.T., Nikita, K.S. (2006). Study on Preprocessing and Classifying Mass Spectral Raw Data Concerning Human Normal and Disease Cases. In: Maglaveras, N., Chouvarda, I., Koutkias, V., Brause, R. (eds) Biological and Medical Data Analysis. ISBMDA 2006. Lecture Notes in Computer Science(), vol 4345. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11946465_35

Download citation

  • DOI: https://doi.org/10.1007/11946465_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68063-5

  • Online ISBN: 978-3-540-68065-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics