Nonlinear Dimensionality Reduction by Minimum Curvilinearity for Unsupervised Discovery of Patterns in Multidimensional Proteomic Data

  • Massimo AlessioEmail author
  • Carlo Vittorio CannistraciEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1384)


Dimensionality reduction is largely and successfully employed for the visualization and discrimination of patterns, hidden in multidimensional proteomics datasets. Principal component analysis (PCA), which is the preferred approach for linear dimensionality reduction, may present serious limitations, in particular when samples are nonlinearly related, as often occurs in several two-dimensional electrophoresis (2-DE) datasets. An aggravating factor is that PCA robustness is impaired when the number of samples is small in comparison to the number of proteomic features, and this is the case in high-dimensional proteomic datasets, including 2-DE ones. Here, we describe the use of a nonlinear unsupervised learning machine for dimensionality reduction called minimum curvilinear embedding (MCE) that was successfully applied to different biological samples datasets. In particular, we provide an example where we directly compare MCE performance with that of PCA in disclosing neuropathic pain patterns, hidden in a multidimensional proteomic dataset.

Key words

Nonlinear dimensionality reduction Unsupervised machine learning Pattern recognition Multivariate analysis Principal component analysis Minimum curvilinearity Minimum curvilinear embedding Visualization High-dimensional data Two-dimensional gel electrophoresis 



M.A. is supported by AIRC Special Program Molecular Clinical Oncology 5 per mille n.9965.C.V.C. is supported by the independent group leader starting grant of the Technische Universität Dresden (TUD). 


  1. 1.
    Smialowski P, Frishman D, Kramer S (2009) Pitfalls of supervised feature selection. Bioinformatics 26:440–443PubMedCentralCrossRefPubMedGoogle Scholar
  2. 2.
    Bellman RE (2003) Dynamic programming. Courier Dover, New YorkGoogle Scholar
  3. 3.
    Martella F (2006) Classification of microarray data with factor mixture models. Bioinformatics 22:202–208CrossRefPubMedGoogle Scholar
  4. 4.
    Cannistraci CV, Ravasi T, Montevecchi F et al. (2010) Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes. Bioinformatics 26:i1–i9CrossRefGoogle Scholar
  5. 5.
    Marengo E, Robotti E, Antonucci F et al. (2005) Numerical approaches for quantitative analysis of two-dimensional maps: a review of commercial software and home-made systems. Proteomics 5:654–666CrossRefPubMedGoogle Scholar
  6. 6.
    Cannistraci CV, Alanis-Lobato G, Ravasi T (2013) Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding. Bioinformatics 29:i199–i209PubMedCentralCrossRefPubMedGoogle Scholar
  7. 7.
    Moitinho-Silva L, Bayer K, Cannistraci CV et al. (2014) Specificity and transcriptional activity of microbiota associated with low and high microbial abundance sponges from the Red Sea. Mol Ecol 23:1348–1363CrossRefPubMedGoogle Scholar
  8. 8.
    Zagar L, Mulas F, Garagna S et al. (2011) Stage prediction of embryonic stem cell differentiation from genome-wide expression data. Bioinformatics 27:2546–2553PubMedGoogle Scholar
  9. 9.
    Conti A, Ricchiuto P, Iannaccone S et al. (2005) Pigment epithelium-derived factor is differentially expressed in peripheral neuropathies. Proteomics 5:4558–4567CrossRefPubMedGoogle Scholar
  10. 10.
    Cannistraci CV, Montevecchi FM, Alessio M (2009) Median-modified Wiener filter provides efficient denoising, preserving spot edge and morphology in 2-DE image processing. Proteomics 9:4908–4919CrossRefPubMedGoogle Scholar
  11. 11.
    Pattini L, Mazzara S, Conti A et al. (2008) An integrated strategy in two-dimensional electrophoresis analysis able to identify discriminants between different clinical conditions. Exp Biol Med 233:483–491CrossRefGoogle Scholar
  12. 12.
    Conti A, Iannaccone S, Sferrazza B et al. (2008) Differential expression of ceruloplasmin isoforms in the cerebrospinal fluid of Amyotrophic Lateral Sclerosis patients. Proteomics Clin Appl 2:1628–1637CrossRefPubMedGoogle Scholar
  13. 13.
    Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Proteome BiochemistryIRCCS-San Raffaele Scientific InstituteMilanItaly
  2. 2.Biomedical Cybernetics Group, Biotechnology Center (BIOTEC)Technische Universität DresdenDresdenGermany

Personalised recommendations