Skip to main content

Using Local Principal Components to Explore Relationships Between Heterogeneous Omics Datasets

  • Chapter
  • First Online:
Book cover Information Reuse and Integration in Academia and Industry
  • 383 Accesses

Abstract

In the post-genomic era, high-throughput technologies lead to the generation of large amounts of ‘omics’ data such as transcriptomics, metabolomics, proteomics or metabolomics, that are measured on the same set of samples. The development of methods that are capable to perform joint analysis of multiple datasets from different technology platforms to unravel the relationships between different biological functional levels becomes crucial. A common way to analyze the relationships between a pair of data sources based on their correlation is canonical correlation analysis (CCA). CCA seeks for linear combinations of all the variables from each dataset which maximize the correlation between them. However, in high dimensional datasets, where the number of variables exceeds the number of experimental units, CCA may not lead to meaningful information. Moreover, when collinearity exists in one or both the datasets, CCA may not be applicable. Here, we present a novel method, (LPC-KR), to extract common features from a pair of data sources using Local Principal Components and Kendall’s Ranking. The results show that the proposed algorithm outperforms CCA in many scenarios and is more robust to noisy data. Moreover, meaningful results are obtained using the proposed algorithm when the number of variables exceeds the number of experimental units.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdi H (2007) Kendall rank correlation. In: Salkind NJ (ed) Encyclopedia of measurement and statistics. Sage, Thousand Oaks, pp 508–510

    Google Scholar 

  2. Agarwal S, Sengupta S (2009) Ranking genes by relevance to a disease. In: Proceedings of the 8th annual international conference on computational systems bioinformatics, Stanford, CA

    Google Scholar 

  3. Alaydie N, Fotouhi F (2011) Unraveling complex relationships between heterogeneous omics datasets using local principal components. In: Proceedings of the IEEE information reuse and integration (IEEE IRI), Las Vegas, pp 136–141

    Google Scholar 

  4. Alaydie N, Reddy CK, Fotouhi F (2011) A bayesian integration model of heterogeneous data sources for improved gene functional inference. In: Proceedings of the ACM conference on bioinformatics and computational biology (BCB), Chicago, pp 376–380

    Google Scholar 

  5. Alaydie N, Reddy CK, Fotouhiand F (2012) Exploiting label dependency for hierarchical multi-label classification. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Kaula Lumpur, pp 294–305

    Google Scholar 

  6. Correa N, Li YO, Adali T, Calhoun VD (2008) Canonical correlation analysis for feature-based fusion of biomedical imaging modalities to detect associative networks in schizophrenia. IEEE J Sel Top Signal Process 2(6):998–1007. Special Issue on fMRI Analysis for Human Brain Mapping

    Google Scholar 

  7. Correa NM, Li YO, Adali T, Calhoun VD (2009) Fusion of fmri, smri, and eeg data using canonical correlation analysis. In: ICASSP ’09: proceedings of the 2009 IEEE international conference on acoustics, speech and signal processing. IEEE Computer Society, Washington, DC, pp 385–388. doi:http://dx.doi.org/10.1109/ICASSP.2009.4959601

    Google Scholar 

  8. Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135

    Article  MATH  Google Scholar 

  9. Gittins R (1985) Canonical analysis: a review with applications in ecology. Springer, Berlin

    Book  MATH  Google Scholar 

  10. Golugula A, Lee G, Master SR, Feldman MD, Tomaszewski JE, Speicher DW, Madabhushi A (2011) Supervised regularized canonical correlation analysis: integrating histologic and proteomic measurements for predicting biochemical recurrence following prostate surgery. BMC Bioinform 12:483

    Article  Google Scholar 

  11. González I, Déjean S, Martin PGP, Baccini A (2008) CCA: an R package to extend canonical correlation analysis. J Stat Softw 23(12):1–14

    Google Scholar 

  12. González I, Déjean S, Martin P, Gonçalves O, Besse P, Baccini A (2009) Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis. J Biol Syst 17(2):173–199

    Article  Google Scholar 

  13. Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377

    MATH  Google Scholar 

  14. Lé Cao KA, Martin P, Robert-Granié C, Besse P (2009) Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinform 10:Article 34

    Google Scholar 

  15. Martin PGP, Guillou H, Lasserre F, Déjean S, Lan A, Pascussi J, SanCristobal M, Legrand P, Besse P, Pineau T (2007) Novel aspects of ppará-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology 45(3):767–777

    Article  Google Scholar 

  16. Nie L, Wu G, Culley DE, Scholten JC, Zhang W (2007) Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications. Crit Rev Biotechnol 27(2):63–75

    Article  Google Scholar 

  17. Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 8:1–34

    MathSciNet  Google Scholar 

  18. Rustandi I, Just MA, Mitchell TM (2009) Integrating multiple-study multiple-subject fmri datasets using canonical correlation analysis. In: Proceedings of the MICCAI workshop: statistical modeling and detection issues in intra- and inter-subject functional MRI data analysis, London, pp 1–8

    Google Scholar 

  19. Szakács G, Annereau JP, Lababidi S, Shankavaram U, Arciello A, Bussey K, Reinhold W, Guo Y, Kruh G, Reimers M, Weinstein J, Gottesman M (2004) Predicting drug sensitivity and resistance: profiling abc transporter genes in cancer cells. Cancer Cell 6:129–137

    Article  Google Scholar 

  20. Vinod HD (1976) Canonical ridge and econometrics of joint production. J Econom 4(2):147–166

    Article  MathSciNet  MATH  Google Scholar 

  21. Wiesel A, Kliger M, Hero AO (2008) A greedy approach to sparse canonical correlation analysis. Submitted to ArXiv, http://arxiv.org/abs/0801.2748

  22. Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiaah PR (ed) Multivariate analysis. Academic, New York

    Google Scholar 

  23. Zhang W, Li F, Nie L (2009) Integrating multiple ‘omics’ analysis for microbial biology: application and methodologies. Microbiology 156:287–301

    Article  Google Scholar 

  24. Zhang W, Li F, Nie L (2010) Integrating multiple ‘omics’ analysis for microbial biology: application and methodologies. Microbiology 156(Pt 2):287–301

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Noor Alaydie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Wien

About this chapter

Cite this chapter

Alaydie, N., Fotouhi, F. (2013). Using Local Principal Components to Explore Relationships Between Heterogeneous Omics Datasets. In: Özyer, T., Kianmehr, K., Tan, M., Zeng, J. (eds) Information Reuse and Integration in Academia and Industry. Springer, Vienna. https://doi.org/10.1007/978-3-7091-1538-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-1538-1_11

  • Published:

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-1537-4

  • Online ISBN: 978-3-7091-1538-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics