Abstract
In the post-genomic era, high-throughput technologies lead to the generation of large amounts of ‘omics’ data such as transcriptomics, metabolomics, proteomics or metabolomics, that are measured on the same set of samples. The development of methods that are capable to perform joint analysis of multiple datasets from different technology platforms to unravel the relationships between different biological functional levels becomes crucial. A common way to analyze the relationships between a pair of data sources based on their correlation is canonical correlation analysis (CCA). CCA seeks for linear combinations of all the variables from each dataset which maximize the correlation between them. However, in high dimensional datasets, where the number of variables exceeds the number of experimental units, CCA may not lead to meaningful information. Moreover, when collinearity exists in one or both the datasets, CCA may not be applicable. Here, we present a novel method, (LPC-KR), to extract common features from a pair of data sources using Local Principal Components and Kendall’s Ranking. The results show that the proposed algorithm outperforms CCA in many scenarios and is more robust to noisy data. Moreover, meaningful results are obtained using the proposed algorithm when the number of variables exceeds the number of experimental units.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdi H (2007) Kendall rank correlation. In: Salkind NJ (ed) Encyclopedia of measurement and statistics. Sage, Thousand Oaks, pp 508–510
Agarwal S, Sengupta S (2009) Ranking genes by relevance to a disease. In: Proceedings of the 8th annual international conference on computational systems bioinformatics, Stanford, CA
Alaydie N, Fotouhi F (2011) Unraveling complex relationships between heterogeneous omics datasets using local principal components. In: Proceedings of the IEEE information reuse and integration (IEEE IRI), Las Vegas, pp 136–141
Alaydie N, Reddy CK, Fotouhi F (2011) A bayesian integration model of heterogeneous data sources for improved gene functional inference. In: Proceedings of the ACM conference on bioinformatics and computational biology (BCB), Chicago, pp 376–380
Alaydie N, Reddy CK, Fotouhiand F (2012) Exploiting label dependency for hierarchical multi-label classification. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Kaula Lumpur, pp 294–305
Correa N, Li YO, Adali T, Calhoun VD (2008) Canonical correlation analysis for feature-based fusion of biomedical imaging modalities to detect associative networks in schizophrenia. IEEE J Sel Top Signal Process 2(6):998–1007. Special Issue on fMRI Analysis for Human Brain Mapping
Correa NM, Li YO, Adali T, Calhoun VD (2009) Fusion of fmri, smri, and eeg data using canonical correlation analysis. In: ICASSP ’09: proceedings of the 2009 IEEE international conference on acoustics, speech and signal processing. IEEE Computer Society, Washington, DC, pp 385–388. doi:http://dx.doi.org/10.1109/ICASSP.2009.4959601
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Gittins R (1985) Canonical analysis: a review with applications in ecology. Springer, Berlin
Golugula A, Lee G, Master SR, Feldman MD, Tomaszewski JE, Speicher DW, Madabhushi A (2011) Supervised regularized canonical correlation analysis: integrating histologic and proteomic measurements for predicting biochemical recurrence following prostate surgery. BMC Bioinform 12:483
González I, Déjean S, Martin PGP, Baccini A (2008) CCA: an R package to extend canonical correlation analysis. J Stat Softw 23(12):1–14
González I, Déjean S, Martin P, Gonçalves O, Besse P, Baccini A (2009) Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis. J Biol Syst 17(2):173–199
Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377
Lé Cao KA, Martin P, Robert-Granié C, Besse P (2009) Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinform 10:Article 34
Martin PGP, Guillou H, Lasserre F, Déjean S, Lan A, Pascussi J, SanCristobal M, Legrand P, Besse P, Pineau T (2007) Novel aspects of ppará-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology 45(3):767–777
Nie L, Wu G, Culley DE, Scholten JC, Zhang W (2007) Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications. Crit Rev Biotechnol 27(2):63–75
Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 8:1–34
Rustandi I, Just MA, Mitchell TM (2009) Integrating multiple-study multiple-subject fmri datasets using canonical correlation analysis. In: Proceedings of the MICCAI workshop: statistical modeling and detection issues in intra- and inter-subject functional MRI data analysis, London, pp 1–8
Szakács G, Annereau JP, Lababidi S, Shankavaram U, Arciello A, Bussey K, Reinhold W, Guo Y, Kruh G, Reimers M, Weinstein J, Gottesman M (2004) Predicting drug sensitivity and resistance: profiling abc transporter genes in cancer cells. Cancer Cell 6:129–137
Vinod HD (1976) Canonical ridge and econometrics of joint production. J Econom 4(2):147–166
Wiesel A, Kliger M, Hero AO (2008) A greedy approach to sparse canonical correlation analysis. Submitted to ArXiv, http://arxiv.org/abs/0801.2748
Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiaah PR (ed) Multivariate analysis. Academic, New York
Zhang W, Li F, Nie L (2009) Integrating multiple ‘omics’ analysis for microbial biology: application and methodologies. Microbiology 156:287–301
Zhang W, Li F, Nie L (2010) Integrating multiple ‘omics’ analysis for microbial biology: application and methodologies. Microbiology 156(Pt 2):287–301
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Wien
About this chapter
Cite this chapter
Alaydie, N., Fotouhi, F. (2013). Using Local Principal Components to Explore Relationships Between Heterogeneous Omics Datasets. In: Özyer, T., Kianmehr, K., Tan, M., Zeng, J. (eds) Information Reuse and Integration in Academia and Industry. Springer, Vienna. https://doi.org/10.1007/978-3-7091-1538-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-7091-1538-1_11
Published:
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-1537-4
Online ISBN: 978-3-7091-1538-1
eBook Packages: Computer ScienceComputer Science (R0)