Skip to main content

Analysis of Spectral Data in Clinical Proteomics by Use of Learning Vector Quantizers

  • Chapter
Computational Intelligence in Biomedicine and Bioinformatics

Summary

Clinical proteomics based on mass spectrometry has gained tremendous visibility in the scientific and clinical community. Machine learning methods are keys for efficient processing of the complex data. One major class are prototype based algorithms. Prototype based vector quantizers or classifiers are intuitive approaches realizing the principle of characteristic representatives for data subsets or decision regions between them. Examples for such tools are Support Vector Machines (SVM) [1], Kohonens Learning Vector Quantization (LVQ) [2], Self-Organizing Map (SOMs) [2], Supervised Relevance Neural Gas (SRNG) [3] and respective variants. Depending on the task one can distinguish between unsupervised methods for data representation and supervised methods for classification. New developments include the utilization of non-standard metrics (functional norms, scaled Euclidean) and task-dependent automatic metric adaptation (feature selection), fuzzy classification, and similarity based visualization of data. These properties offer new possibilities for analysis of mass spectrometric data. In this contribution we concentrate on recent extensions of SOMs as universal tools in the light of clinical proteomics. We focus on non-standard metrics and biomarker patterns discovery. We consider extensions of the standard SOM and LVQ for handling of more general metrics. In particular, we demonstrate applications of the weighted Euclidean metric and the weighted functional norm (based on weighted L p-norm) or kernelized metrics taking the specific nature of mass-spectra into account. This allows an efficient feature selection, which may be used for biomarker identification. The adaptation of the algorithms to these specific requirements leads to effective tools for knowledge discovery keeping the robustness of the original simple approaches. Further we consider fuzzy classification and regression within the determination of clinical proteomics models. This topic deals with the widely ranged problem of uncertainty of data. Particularly in medicine, the classification of mass spectra may be subject of individual human assessment (based on some expert knowledge), multi-impairment diseases, and incomplete patient/proband information. This leads to the problem of uncertainty of training data in machine learning data bases. We developed a semi-supervised approach based on SOM to process such data. As a result the algorithm provides a fuzzy classification scheme based on prototypes for classification of spectra (Fuzzy Labeled SOM - FLSOM).

We demonstrate the usefulness of the above extensions of the basic prototype based data analysis by SOMs to the analysis of mass spectra in proteomics and related knowledge discovery. In particular, we give application examples for biomarker detection based on feature selection and fuzzy classification of spectra combined with similarity based class visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  2. Kohonen, T. (ed.): Self-Organizing Maps, Springer Series in Information Sciences, vol. 30. Springer, Berlin (1995) (2nd Ext. Ed. 1997)

    Google Scholar 

  3. Hammer, B., Strickert, M., Villmann, T.: Supervised neural gas with general similarity measure. Neural Proc. Letters 21(1), 21–44 (2005)

    Article  Google Scholar 

  4. Pusch, W., Flocco, M., Leung, S., Thiele, H., Kostrzewa, M.: Mass spectrometry-based clinical proteomics. Pharmacogenomics 4, 463–476 (2003)

    Article  Google Scholar 

  5. Petricoin, E., Ardekani, A., Hitt, B., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)

    Article  Google Scholar 

  6. Wulfkuhle, J., Petricoin, E., Liotta, L.: Proteomic applications for the early detection of cancer. Nat. Rev. Cancer 3, 267–275 (2003)

    Article  Google Scholar 

  7. Ransohoff, D.: Lessons from controversy: ovarian cancer screening and serum proteomics, J. Natl. Cancer Inst. 97, 315–319 (2005)

    Article  Google Scholar 

  8. Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., Kobayashi, R.: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 21(9), 1764–1775 (2005)

    Article  Google Scholar 

  9. Vannucci, M., Sha, N., Brown, P.J.: Nir and mass spectra classification: Bayesian methods for wavelet-based feature selection. Chem. and Int. Lab Systems 77, 139–148 (2005)

    Google Scholar 

  10. Yu, J.S., Ongarello, S., Fiedler, R., et al.: Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21(10), 2200–2209 (2005)

    Article  Google Scholar 

  11. de Noo, M., Deelder, A., van der Werff, M., zalp, A., Martens, B.: MALDI-TOF serum protein profiling for detection of breast cancer. Onkologie 29, 501–506 (2006)

    Article  Google Scholar 

  12. Fiedler, G., Baumann, S., Leichtle, A., Oltmann, A., Kase, J., Thiery, J., Ceglarek, U.: Standardized peptidome profiling of human urine by magnetic bead separation and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clinical Chemistry 53(3), 421–428 (2007)

    Article  Google Scholar 

  13. Schäffeler, E., Zanger, U., Schwab, M., et al.: Magnetic bead based human plasma profiling discriminate acute lymphatic leukaemia from non-diseased samples. In: 52nd ASMS Conference. TPV 420 (2004)

    Google Scholar 

  14. Schipper, R., Loof, A., de Groot, J., Harthoorn, L., van Heerde, W., Dransfield, E.: Salivary protein/peptide profiling with seldi-tof-ms. Annals of the New York Academy of Science 1098, 498–503 (2007)

    Article  Google Scholar 

  15. Guerreiro, N., Gomez-Mancilla, B., Charmont, S.: Optimization and evaluation of seldi-tof mass spectrometry for protein profiling of cerebrospinal fluid. Proteome science 4, 7 (2006)

    Article  Google Scholar 

  16. Villmann, T., Der, R., Herrmann, M., Martinetz, T.: Topology Preservation in Self–Organizing Feature Maps: Exact Definition and Measurement. IEEE Transactions on Neural Networks 8(2), 256–266 (1997)

    Article  Google Scholar 

  17. Schleif, F.M., Elssner, T., Kostrzewa, M., Villmann, T., Hammer, B.: Analysis and Visualization of Proteomic Data by Fuzzy labeled Self Organizing Maps. In: Proc. of CBMS 2006, pp. 919–924 (2006)

    Google Scholar 

  18. Wang, J., Bo, T.H., Jonassen, I., Myklebost, O., Hovig, E.: Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data. BMC Bioinformatics 4, 60 (2003)

    Article  Google Scholar 

  19. Arima, C., Hanai, T., Okamoto, M.: Gene expression analysis using fuzzy k-means clustering. Genome Informatics 14, 334–335 (2003)

    Google Scholar 

  20. Bishop, C.: Pattern Recognition and Machine Learning. Springer, Science+Business Media, LLC, New York (2006)

    MATH  Google Scholar 

  21. Pudil, P., Novovicova, J.: Floating search methods in feature selection. Pattern Recognition Letters 15, 1119–1125 (1994)

    Article  Google Scholar 

  22. Somol, P., Pudil, P.: Adaptive floating search methods in feature selection. Pattern Recognition Letters 20, 1157–1163 (1999)

    Article  Google Scholar 

  23. Guyon, I., Gunn, S., Nikravesh, M., Zahed, L.A.: Feature Extraction - Foundations and Applications. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  24. Hecht-Nielsen, R.: Counterprogagation networks. Appl. Opt. 26(23), 4979–4984 (1987)

    Article  Google Scholar 

  25. Vuorimaa, P.: Fuzzy self-organizing map. Fuzzy Sets and Systems 66(2), 223–231 (1994)

    Article  Google Scholar 

  26. Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: Ordering, convergence properties and energy functions. Biol. Cyb. 67(1), 47–55 (1992)

    Article  MATH  Google Scholar 

  27. Heskes, T.: In: Oja, E., Kaski, S. (eds.) Kohonen Maps, pp. 303–316. Elsevier, Amsterdam (1999)

    Chapter  Google Scholar 

  28. Hastie, T., Stuetzle, W.: Principal curves. J. Am. Stat. Assn. 84, 502–516 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  29. Bauer, H.U., Pawelzik, K.R.: Quantifying the neighborhood preservation of Self-Organizing Feature Maps. IEEE Trans on Neural Networks 3(4), 570–579 (1992)

    Article  Google Scholar 

  30. Schleif, F.M., Hammer, B., Villmann, T.: Supervised Neural Gas for Functional Data and its Application to the Analysis of Clinical Proteom Spectra. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 1036–1044. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  31. Ketterlinus, R., Hsieh, S.Y., Teng, S.H., Lee, H., Pusch, W.: Fishing for biomarkers: analyzing mass spectrometry data with the new clinprotools software. Bio techniques 38(6), 37–40 (2005)

    Google Scholar 

  32. Schleif, F.M.: Prototype based Machine Learning for Clinical Proteomics. Ph.D. Thesis, Technical University Clausthal, Technical University Clausthal, Clausthal-Zellerfeld, Germany (2006)

    Google Scholar 

  33. Daubechies, I.: Ten lectures on wavelets. In: CBMS-NSF Regional Conference Series in Applied Mathematics, Philadelphia, PA. Society for Industrial and Applied Mathematics (SIAM), vol. 61 (1992)

    Google Scholar 

  34. Mallat, S.: A wavelet tour of signal processing. Academic Press, San Diego (1998)

    MATH  Google Scholar 

  35. Louis, A.K., Maaß, A.P.: Wavelets: Theory and Applications. Wiley, Chichester (1998)

    MATH  Google Scholar 

  36. Lio, P.: Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics 19(1), 2–9 (2003)

    Article  Google Scholar 

  37. Zhu, H., Yu, C.Y., Zhang, H.: Tree-based disease classification using protein data. Proteomics 3, 1673–1677 (2003)

    Article  Google Scholar 

  38. Waagen, D., Cassabaum, M., Scott, C., Schmitt, H.: Exploring alternative wavelet base selection techniques with application to high resolution radar classification. In: Proc. of the 6th Int. Conf. on Inf. Fusion (ISIF 2003), pp. 1078–1085. IEEE Press, Los Alamitos (2003)

    Google Scholar 

  39. Leung, A., Chau, F., Gao, J.: A review on applications of wavelet transform techniques in chemical analysis: 1989-1997. Chem. and Int. Lab. Sys. 43(1), 165–184(20) (1998)

    Article  Google Scholar 

  40. Cohen, A., Daubechies, I., Feauveau, J.C.: Biorthogonal bases of compactly supported wavelets. Comm. Pure Appl. Math. 45(5), 485–560 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  41. Villmann, T., Strickert, M., Brüß, C., Schleif, F.M., Seiffert, U.: Visualization of fuzzy information in fuzzy-classification for image sagmentation using MDS. In: Proc. of ESANN 2007, pp. 103–108 (2007)

    Google Scholar 

  42. Hammer, B., Villmann, T.: Generalized relevance learning vector quantization. Neural Netw 15(8-9), 1059–1068 (2002)

    Article  Google Scholar 

  43. Lee, J., Verleysen, M.: Generalizations of the Lp Norm for time series and its application to Self-Organizing Maps. In: Cottrell, M. (ed.) 5th Workshop on Self-Organizing Maps, vol. 1, pp. 733–740 (2005)

    Google Scholar 

  44. Hammer, B., Schleif, F.M., Villmann, T.: On the generalization ability of prototype-based classifiers with local relevance determination, Technical Reports University of Clausthal IfI-05-14, p. 18 (2005)

    Google Scholar 

  45. Schneider, P., Biehl, M., Hammer, B.: Relevance Matrices in LVQ. In: Proc. of ESANN 2007, pp. 37–42 (2007)

    Google Scholar 

  46. Baumann, S., Ceglarek, U., Fiedler, G., Lembcke, J., Leichtle, A., Thiery, J.: Standardized approach to proteomic profiling of human serum based magnetic bead separation and matrix-assisted laser esorption/ionization time-of flight mass spectrometry. Clinical Chemistry 51, 973–980 (2005)

    Article  Google Scholar 

  47. Check, E.: Proteomics and cancer: Running before we can walk? Nature 429, 496–497 (2004)

    Article  Google Scholar 

  48. Villmann, T., Schleif, F.M., Merenyi, E., Hammer, B.: Fuzzy Labeled Self Organizing Map for Classification of Spectra. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 556–563. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  49. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)

    MATH  Google Scholar 

  50. Zhang, Z., Page, G., Zhang, H.: Fishing Expedition - A supervised approach to extract patterns from a compendium of expression profiles. In: Lin, S.M., Johnson, K.F. (eds.) Methods of Microarray Data Analysis II. Kluwer Academic Publishers, Dordrecht (papers from CAMDA 2001) (2002)

    Google Scholar 

  51. Lee, Y., Lee, C.K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9), 1132–1139 (2003)

    Article  Google Scholar 

  52. Villmann, T., Bauer, H.U., Villmann, T.: Proceedings of WSOM 1997, Workshop on Self-Organizing Maps, Helsinki University of Technology Neural Networks Research Centre, June 4-6, pp. 286–291 (1997)

    Google Scholar 

  53. Bauer, H.U., Villmann, T.: Growing a Hypercubical Output Space in a Self–Organizing Feature Map. IEEE Transactions on Neural Networks 8(2), 218–226 (1997)

    Article  Google Scholar 

  54. Carpenter, G., Grossberg, S.: The Handbook of Brain Theory and Neural Networks, 2nd edn., pp. 87–90. MIT Press, Cambridge (2003)

    Google Scholar 

  55. Villmann, T., Hammer, B., Schleif, F.M., Geweniger, T.: Fuzzy classification by fuzzy labeled neural gas. Neural Networks 19(6-7), 772–779 (2006)

    Article  MATH  Google Scholar 

  56. Der, R., Herrmann, M.: Instabilities in Self-Organized Feature Maps with Short Neighborhood Range. In: Verleysen, M. (ed.) Proc. ESANN 1994, European Symp. on Artificial Neural Networks, pp. 271–276. D facto conference services, Brussels, Belgium (1994)

    Google Scholar 

  57. Molinaro, A., Simon, R., Pfeiffer, R.: Prediction error estimation: A comparison of resampling methods. Bioinformatics 21(15), 3301–3307 (2005)

    Article  Google Scholar 

  58. Kearns, M.J., Mansur, Y., Ng, A., Ron, D.: An experimental and theoretical comparison of model selection methods. Machine Learning 27, 7–50 (1997)

    Article  Google Scholar 

  59. Bartlett, P.L., Boucheron, S., Lugosi, G.: Model selection and error estimation. Machine Learning 48, 85–113 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schleif, FM., Villmann, T., Hammer, B., van der Werff, M., Deelder, A., Tollenaar, R. (2008). Analysis of Spectral Data in Clinical Proteomics by Use of Learning Vector Quantizers. In: Smolinski, T.G., Milanova, M.G., Hassanien, AE. (eds) Computational Intelligence in Biomedicine and Bioinformatics. Studies in Computational Intelligence, vol 151. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70778-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70778-3_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70776-9

  • Online ISBN: 978-3-540-70778-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics