Using Local Principal Components to Explore Relationships Between Heterogeneous Omics Datasets

Alaydie, Noor; Fotouhi, Farshad

doi:10.1007/978-3-7091-1538-1_11

Noor Alaydie⁵ &
Farshad Fotouhi⁵

383 Accesses

Abstract

In the post-genomic era, high-throughput technologies lead to the generation of large amounts of ‘omics’ data such as transcriptomics, metabolomics, proteomics or metabolomics, that are measured on the same set of samples. The development of methods that are capable to perform joint analysis of multiple datasets from different technology platforms to unravel the relationships between different biological functional levels becomes crucial. A common way to analyze the relationships between a pair of data sources based on their correlation is canonical correlation analysis (CCA). CCA seeks for linear combinations of all the variables from each dataset which maximize the correlation between them. However, in high dimensional datasets, where the number of variables exceeds the number of experimental units, CCA may not lead to meaningful information. Moreover, when collinearity exists in one or both the datasets, CCA may not be applicable. Here, we present a novel method, (LPC-KR), to extract common features from a pair of data sources using Local Principal Components and Kendall’s Ranking. The results show that the proposed algorithm outperforms CCA in many scenarios and is more robust to noisy data. Moreover, meaningful results are obtained using the proposed algorithm when the number of variables exceeds the number of experimental units.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdi H (2007) Kendall rank correlation. In: Salkind NJ (ed) Encyclopedia of measurement and statistics. Sage, Thousand Oaks, pp 508–510
Google Scholar
Agarwal S, Sengupta S (2009) Ranking genes by relevance to a disease. In: Proceedings of the 8th annual international conference on computational systems bioinformatics, Stanford, CA
Google Scholar
Alaydie N, Fotouhi F (2011) Unraveling complex relationships between heterogeneous omics datasets using local principal components. In: Proceedings of the IEEE information reuse and integration (IEEE IRI), Las Vegas, pp 136–141
Google Scholar
Alaydie N, Reddy CK, Fotouhi F (2011) A bayesian integration model of heterogeneous data sources for improved gene functional inference. In: Proceedings of the ACM conference on bioinformatics and computational biology (BCB), Chicago, pp 376–380
Google Scholar
Alaydie N, Reddy CK, Fotouhiand F (2012) Exploiting label dependency for hierarchical multi-label classification. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Kaula Lumpur, pp 294–305
Google Scholar
Correa N, Li YO, Adali T, Calhoun VD (2008) Canonical correlation analysis for feature-based fusion of biomedical imaging modalities to detect associative networks in schizophrenia. IEEE J Sel Top Signal Process 2(6):998–1007. Special Issue on fMRI Analysis for Human Brain Mapping
Google Scholar
Correa NM, Li YO, Adali T, Calhoun VD (2009) Fusion of fmri, smri, and eeg data using canonical correlation analysis. In: ICASSP ’09: proceedings of the 2009 IEEE international conference on acoustics, speech and signal processing. IEEE Computer Society, Washington, DC, pp 385–388. doi:http://dx.doi.org/10.1109/ICASSP.2009.4959601
Google Scholar
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Article MATH Google Scholar
Gittins R (1985) Canonical analysis: a review with applications in ecology. Springer, Berlin
Book MATH Google Scholar
Golugula A, Lee G, Master SR, Feldman MD, Tomaszewski JE, Speicher DW, Madabhushi A (2011) Supervised regularized canonical correlation analysis: integrating histologic and proteomic measurements for predicting biochemical recurrence following prostate surgery. BMC Bioinform 12:483
Article Google Scholar
González I, Déjean S, Martin PGP, Baccini A (2008) CCA: an R package to extend canonical correlation analysis. J Stat Softw 23(12):1–14
Google Scholar
González I, Déjean S, Martin P, Gonçalves O, Besse P, Baccini A (2009) Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis. J Biol Syst 17(2):173–199
Article Google Scholar
Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377
MATH Google Scholar
Lé Cao KA, Martin P, Robert-Granié C, Besse P (2009) Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinform 10:Article 34
Google Scholar
Martin PGP, Guillou H, Lasserre F, Déjean S, Lan A, Pascussi J, SanCristobal M, Legrand P, Besse P, Pineau T (2007) Novel aspects of ppará-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology 45(3):767–777
Article Google Scholar
Nie L, Wu G, Culley DE, Scholten JC, Zhang W (2007) Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications. Crit Rev Biotechnol 27(2):63–75
Article Google Scholar
Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 8:1–34
MathSciNet Google Scholar
Rustandi I, Just MA, Mitchell TM (2009) Integrating multiple-study multiple-subject fmri datasets using canonical correlation analysis. In: Proceedings of the MICCAI workshop: statistical modeling and detection issues in intra- and inter-subject functional MRI data analysis, London, pp 1–8
Google Scholar
Szakács G, Annereau JP, Lababidi S, Shankavaram U, Arciello A, Bussey K, Reinhold W, Guo Y, Kruh G, Reimers M, Weinstein J, Gottesman M (2004) Predicting drug sensitivity and resistance: profiling abc transporter genes in cancer cells. Cancer Cell 6:129–137
Article Google Scholar
Vinod HD (1976) Canonical ridge and econometrics of joint production. J Econom 4(2):147–166
Article MathSciNet MATH Google Scholar
Wiesel A, Kliger M, Hero AO (2008) A greedy approach to sparse canonical correlation analysis. Submitted to ArXiv, http://arxiv.org/abs/0801.2748
Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiaah PR (ed) Multivariate analysis. Academic, New York
Google Scholar
Zhang W, Li F, Nie L (2009) Integrating multiple ‘omics’ analysis for microbial biology: application and methodologies. Microbiology 156:287–301
Article Google Scholar
Zhang W, Li F, Nie L (2010) Integrating multiple ‘omics’ analysis for microbial biology: application and methodologies. Microbiology 156(Pt 2):287–301
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, College of Engineering, Wayne State University, Detroit, MI, 48202, USA
Noor Alaydie & Farshad Fotouhi

Authors

Noor Alaydie
View author publications
You can also search for this author in PubMed Google Scholar
Farshad Fotouhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Noor Alaydie .

Editor information

Editors and Affiliations

TOBB University Department of Computer Engineering, Sogutozu Ankara, Turkey
Tansel Özyer
Department of Electrical Engineering Thompson Engineering, University of West Ontario, London, Ontario, Canada
Keivan Kianmehr
Tobb Etü Economics and Technology Univer, Ankara, Ankara, Turkey
Mehmet Tan
Baylor College of Medicine, Houston, Texas, USA
Jia Zeng

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Alaydie, N., Fotouhi, F. (2013). Using Local Principal Components to Explore Relationships Between Heterogeneous Omics Datasets. In: Özyer, T., Kianmehr, K., Tan, M., Zeng, J. (eds) Information Reuse and Integration in Academia and Industry. Springer, Vienna. https://doi.org/10.1007/978-3-7091-1538-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-7091-1538-1_11
Published: 22 August 2013
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-1537-4
Online ISBN: 978-3-7091-1538-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics