Skip to main content

A New Wavelet-Based Approach for Mass Spectrometry Data Classification

  • Chapter
  • First Online:
New Frontiers of Biostatistics and Bioinformatics

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

Proteomic patterns can help the diagnosis of the underlying pathological state of an organ such as the ovary, the lung, and the breast, to name a few. An accurate classification of mass spectrometry is a crucial point to establish a reliable diagnosis and decision process regarding the type of cancer. A statistical methodology for classifying mass spectrometry data is proposed. An overview of wavelets, principal component analysis-T 2 statistic, and support vector machines is given. The study is performed on low-mass SELDI spectra derived from patients with breast cancer and from normal controls. There are 156 samples where control (normal) patients contribute with 57 samples and 99 samples are cancer. A hyperparameter optimization is conducted to select a support vector machine classification model based on grid search. The performance was evaluated with a k-fold cross validation technique and Monte-Carlo simulation with 100 replications. The average accuracy is 100% with standard error equals to 0. The averages of the sensitivity and specificity are both equal to 100%, as well as the area under the curve. The excellent performance of our proposed method is mainly due to the statistical modeling and the feature extraction procedure proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aebersold, R., & Mann, M. (2003). Mass spectrometry-based proteomics. Nature, 422(6928), 198–207.

    Article  Google Scholar 

  • Awedat, K., Abdel-Qader, I., & Springstead, J. R. (2016). Mass spectrometry sensing data for robust cancer classification. In Electro Information Technology (EIT), 2016 IEEE International Conference on (pp. 0258–0262). Piscataway: IEEE.

    Google Scholar 

  • Cohen, A., Daubechies, I., & Feauveau, J.-C. (1992). Biorthogonal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics, 45(5), 485–560.

    Article  MathSciNet  MATH  Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), 273–297.

    MATH  Google Scholar 

  • Cravatt, B. F., Simon, G. M., & Yates Iii, J. R. (2007). The biological impact of mass-spectrometry-based proteomics. Nature, 450(7172), 991.

    Article  Google Scholar 

  • Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Das, S. (2001). Filters, wrappers and a boosting-based hybrid for feature selection. In ICML (Vol. 1, pp. 74–81).

    Google Scholar 

  • Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia, PA: Society for Industrial and Applied Mathematics.

    Book  MATH  Google Scholar 

  • de Noo, M. E., Mertens, B. J., Özalp, A., Bladergroen, M. R., van der Werff, M. P., van de Velde, C. J., et al. (2006). Detection of colorectal cancer using maldi-tof serum protein profiling. European Journal of Cancer, 42(8), 1068–1076.

    Article  Google Scholar 

  • Diamandis, E. P. (2004). Mass spectrometry as a diagnostic and a cancer biomarker discovery tool opportunities and potential limitations. Molecular & Cellular Proteomics, 3(4), 367–378.

    Article  Google Scholar 

  • Donoho, D. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho, D. L., & Johnstone, J. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81(3), 425–455.

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho, D. L., & Johnstone, J. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. Jouranl of the American Statistical Association, 90, 1200–1224.

    Article  MathSciNet  MATH  Google Scholar 

  • Du, J., Wu, X.-M., Wang, B., Su, H.-J., Ma, K., & Zhang, H.-Q. (2009). Wavelet transform and bagging predictor approaches to cancer identification from mass spectrometry-based proteomic data. In Bioinformatics and Biomedical Engineering, 2009. ICBBE 2009. 3rd International Conference on (pp. 1–4). Piscataway: IEEE.

    Google Scholar 

  • Dubitzky, W., Granzow, M., & Berrar, D. P. (2007). Fundamentals of data mining in genomics and proteomics. Berlin: Springer Science and Business Media.

    Book  MATH  Google Scholar 

  • Gao, H.-Y. (1998). Wavelet shrinkage denoising using the non-negative garrote. Journal of Computational and Graphical Statistics, 7(4), 469–488.

    MathSciNet  Google Scholar 

  • Gao, H.-Y., & Bruce, A. G. (1997). Waveshrink with firm shrinkage. Statistica Sinica, 7(4), 855–874.

    MathSciNet  MATH  Google Scholar 

  • Ge, G., & Wong, G. W. (2008). Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles. BMC Bioinformatics, 9(1), 275.

    Article  Google Scholar 

  • Gromski, P. S., Xu, Y., Correa, E., Ellis, D. I., Turner, M. L., & Goodacre, R. (2014). A comparative investigation of modern feature selection and classification approaches for the analysis of mass spectrometry data. Analytica Chimica Acta, 829, 1–8.

    Article  Google Scholar 

  • Jolliffe, I. T. (1986). Principal component analysis and factor analysis. In Principal component analysis (pp. 115–128). Berlin: Springer.

    Chapter  Google Scholar 

  • Lancashire, L. J., Lemetre, C., & Ball, G. R. (2009). An introduction to artificial neural networks in bioinformatics—application to complex microarray and mass spectrometry datasets in cancer studies. Briefings in Bioinformatics, 10, 315–329. https://doi.org/10.1093/bib/bbp012.

    Article  Google Scholar 

  • Li, Y., & Zeng, X. (2016). Serum seldi-tof ms analysis model applied to benign and malignant ovarian tumor identification. Analytical Methods, 8(1), 183–188.

    Article  Google Scholar 

  • Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693.

    Article  MATH  Google Scholar 

  • Misiti, M., Misiti, Y., Oppenheim, G., & Poggi, J. (1996). Wavelet toolbox. Natick, MA: The MathWorks Inc.

    MATH  Google Scholar 

  • Morris, J. S., Brown, P. J., Baggerly, K. A., & Coombes, K. R. (2006). Analysis of mass spectrometry data using bayesian wavelet-based functional mixed models. In Bayesian inference for gene expression and proteomics (pp. 269–288). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Nguyen, T., Nahavandi, S., Creighton, D., & Khosravi, A. (2015). Mass spectrometry cancer data classification using wavelets and genetic algorithm. FEBS Letters, 589(24), 3879–3886.

    Article  Google Scholar 

  • Ohn, S.-Y., Chi, S.-D., & Heo, C. (2016). Identification of breast cancer by classification of proteome patterns. International Journal of Modeling, Simulation, and Scientific Computing, 7(04), 1643004.

    Article  Google Scholar 

  • P. Datasets for Breast Cancer (2004). http://bioinformatics.mdanderson.org/pubdata.html.

  • Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., et al. (2002). Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306), 572–577.

    Article  Google Scholar 

  • Schleif, F.-M., Lindemann, M., Diaz, M., Maaß, P., Decker, J., Elssner, T., et al. (2009). Support vector classification of proteomic profile spectra based on feature extraction with the bi-orthogonal discrete wavelet transform. Computing and Visualization in Science, 12(4), 189–199.

    Article  MathSciNet  Google Scholar 

  • Sharma, A., & Singh, S. (2016). Neural network for diagnosis of ovarian cancer based on proteomic patterns in serum. Journal of Scientific and Technical Advancements, 2(2), 25–27.

    Google Scholar 

  • Shawe-Taylor, J. & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Tang, K.-L., Li, T.-H., Xiong, W.-W., & Chen, K. (2010). Ovarian cancer classification based on dimensionality reduction for seldi-tof data. BMC Bioinformatics, 11(1), 109.

    Article  Google Scholar 

  • Vapnik, V. (2013). The nature of statistical learning theory. Berlin: Springer Science and Business Media.

    MATH  Google Scholar 

  • Ward, D. G., Nyangoma, S., Joy, H., Hamilton, E., Wei, W., Tselepis, C., et al. (2008). Proteomic profiling of urine for the detection of colon cancer. Proteome Science, 6(1), 19.

    Article  Google Scholar 

  • Wu, J., Ji, Y., Zhao, L., Ji, M., Ye, Z., & Li, S. (2016). A mass spectrometric analysis method based on ppca and svm for early detection of ovarian cancer. Computational and Mathematical Methods in Medicine, 2016, 6169249.

    Google Scholar 

  • Yildiz, P. B., Shyr, Y., Rahman, J. S., Wardwell, N. R., Zimmerman,L. J., Shakhtour, B., et al. (2007). Diagnostic accuracy of maldi mass spectrometric analysis of unfractionated serum in lung cancer. Journal of Thoracic Oncology, 2(10), 893–901.

    Article  Google Scholar 

  • Yu, J., Ongarello, S., Fiedler, R., Chen, X., Toffolo, G., Cobelli, C., et al. (2005). Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics, 21(10), 2200–2209.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Achraf Cohen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cohen, A., Messaoudi, C., Badir, H. (2018). A New Wavelet-Based Approach for Mass Spectrometry Data Classification. In: Zhao, Y., Chen, DG. (eds) New Frontiers of Biostatistics and Bioinformatics. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-99389-8_8

Download citation

Publish with us

Policies and ethics