Supervised Normalization of Large-Scale Omic Datasets Using Blind Source Separation

Teschendorff, Andrew E.; Renard, Emilie; Absil, Pierre A.

doi:10.1007/978-3-642-55016-4_17

Supervised Normalization of Large-Scale Omic Datasets Using Blind Source Separation

Andrew E. Teschendorff^3,4,
Emilie Renard⁵ &
Pierre A. Absil⁵

Chapter
First Online: 01 January 2014

2805 Accesses

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Biotechnological advances in genomics have heralded in a new era of quantitative molecular biology whereby it is now possible to routinely measure over tens of thousands of molecular features (e.g., gene expression levels) in hundreds if not thousands of patient samples. A key statistical challenge in the analysis of such large omic datasets is the presence of confounding sources of variation, which are often either unknown or only known with error. In this chapter, we present a supervised normalization method in which Blind Source Separation (BSS) is applied to identify the sources of variation, and demonstrate that this leads to improved statistical inference in subsequent supervised analyses. The statistical framework presented here will be of interest to biologists, bioinformaticians and signal processing experts alike.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Instead of the residual variation matrix $R$ which requires specification of the POI and is thus supervised.

References

Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J., Stratton, M.R.: Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3(1), 246–259 (2013)
Google Scholar
Baufays, H.: Unification de techniques de sparation aveugle de sources avec application l’analyse de l’expression des gnes. Ecole Polytechnique de Louvain, Master thesis with Prof. P.-A. Absil (2011)
Google Scholar
Bell, C.G., Teschendorff, A.E., Rakyan, V.K., Maxwell, A.P., Beck, S., Savage, D.A.: Genome-wide dna methylation analysis for diabetic nephropathy in type 1 diabetes mellitus. BMC Med. Genomics 3, 33 (2010)
Article Google Scholar
Bibikova, M., Le, J., Barnes, B., Saedinia-Melnyk, S., Zhou, L., Shen, R., Gunderson, K.L.: Genome-wide DNA methylation profiling using the infinium assay. Epigenomics 1(1), 177–200 (2009)
Google Scholar
Blenkiron, C., Goldstein, L.D., Thorne, N.P., Spiteri, I., Chin, S.F., Dunning, M.J., Barbosa-Morais, N.L., Teschendorff, A.E., Green, A.R., Ellis, I.O., Tavar, S., Caldas, C., Miska, E.A.: Microrna expression profiling of human breast cancer identifies new markers of tumor subtype. Genome Biol. 8(10), R214 (2007)
Article Google Scholar
Cardoso, J.F.: High-order contrasts for independent component analysis. Neural Comput. 11(1), 157–192 (1999)
Article MathSciNet Google Scholar
Consortium 1000 Genomes Project, Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., McVean, G.A.: An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422), 56–65 (2012)
Google Scholar
Curtis, C., Shah, S.P., Chin, S.F., Turashvili, G., Rueda, O.M., Dunning, M.J., Speed, D., Lynch, A.G., Samarajiwa, S., Yuan, Y., Grf, S., Ha, G., Haffari, G., Bashashati, A., Russell, R., McKinney, S., Watson, P., Markowetz, F., Murphy, L., Ellis, I., Purushotham, A., Brresen-Dale, A.L., Brenton, J.D., Tavar, S., Caldas, C., Aparicio, S.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403), 346–352 (2012)
Google Scholar
Deaton, A.M., Bird, A.: Cpg islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011)
Article Google Scholar
Doane, A.S., Danso, M., Lal, P., Donaton, M., Zhang, L., Hudis, C., Gerald, W.L.: An estrogen receptor-negative breast cancer subset characterized by a hormonally regulated transcriptional program and response to androgen. Oncogene 25(28), 3994–4008 (2006)
Article Google Scholar
Feinberg, A.P., Vogelstein, B.: Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 301(5895), 89–92 (1983)
Article Google Scholar
Frigyesi, A., Veerla, S., Lindgren, D., Hoglund, M.: Independent component analysis reveals new and biologically significant structures in micro array data. BMC Bioinformatics 7, 290 (2006)
Article Google Scholar
Gao, Y., Church, G.: Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics 21(21), 3970–3975 (2005)
Article Google Scholar
Huang, D.S., Zheng, C.H.: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15), 1855–1862 (2006)
Article Google Scholar
Hyvaerinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)
Google Scholar
Johnson, W.E., Li, C., Rabinovic, A.: Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics 8(1), 118–127 (2007)
Article MATH Google Scholar
Jones, P.A., Baylin, S.B.: The epigenomics of cancer. Cell 128(4), 683–692 (2007)
Article Google Scholar
Lee, S.I., Batzoglou, S.: Application of independent component analysis to microarrays. Genome Biol. 4(11), R76 (2003)
Article Google Scholar
Leek, J.T., Storey, J.D.: A general framework for multiple testing dependence. Proc. Natl. Acad. Sci. USA 105(48), 18, 718–18, 723 (2008)
Google Scholar
Leek, J.T., Storey, J.D.: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3(9), 1724–1735 (2007)
Article Google Scholar
Leek, J.T., Scharpf, R.B., Bravo, H.C., Simcha, D., Langmead, B., Johnson, W.E., Geman, D., Baggerly, K., Irizarry, R.A.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11(10), 733–739 (2010)
Article Google Scholar
Liao, J.C., Boscolo, R., Yang, Y.L., Tran, L.M., Sabatti, C., Roychowdhury, V.P.: Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl. Acad. Sci. USA 100(26), 15,522–15,527 (2003)
Google Scholar
Liebermeister, W.: Linear modes of gene expression determined by independent component analysis. Bioinformatics 18(1), 51–60 (2002)
Article Google Scholar
Liu, Y., Aryee, M.J., Padyukov, L., Fallin, M.D., Hesselberg, E., Runarsson, A., Reinius, L., Acevedo, N., Taub, M., Ronninger, M., Shchetynsky, K., Scheynius, A., Kere, J., Alfredsson, L., Klareskog, L., Ekstrm, T.J., Feinberg, A.P.: Epigenome-wide association data implicate dna methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat. Biotechnol. 31(2), 142–147 (2013)
Google Scholar
Liu, N.W., Sanford, T., Srinivasan, R., Liu, J.L., Khurana, K., Aprelikova, O., Valero, V., Bechert, C., Worrell, R., Pinto, P.A., Yang, Y., Merino, M., Linehan, W.M., Bratslavsky, G.: Impact of ischemia and procurement conditions on gene expression in renal cell carcinoma. Clin. Cancer Res. 19(1), 42–49 (2013)
Google Scholar
Loi, S., Haibe-Kains, B., Desmedt, C., Lallemand, F., Tutt, A.M., Gillet, C., Ellis, P., Harris, A., Bergh, J., Foekens, J.A., Klijn, J.G., Larsimont, D., Buyse, M., Bontempi, G., Delorenzi, M., Piccart, M.J., Sotiriou, C.: Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J. Clin. Oncol. 25(10), 1239–1246 (2007)
Article Google Scholar
Maegawa, S., Hinkal, G., Kim, H.S., Shen, L., Zhang, L., Zhang, J., Zhang, N., Liang, S., Donehower, L.A., Issa, J.P.: Widespread and tissue specific age-related dna methylation changes in mice. Genome Res. 20(3), 332–340 (2010)
Article Google Scholar
Martoglio, A.M., Miskin, J.W., Smith, S.K., MacKay, D.J.: A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer. Bioinformatics 18(12), 1617–1624 (2002)
Article Google Scholar
Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A., Guhr, T., Stanley, H.E.: Random matrix approach to cross correlations in financial data. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 65(6), 066,126 (2002)
Google Scholar
Rakyan, V.K., Down, T.A., Maslau, S., Andrew, T., Yang, T.P., Beyan, H., Whittaker, P., McCann, O.T., Finer, S., Valdes, A.M., Leslie, R.D., Deloukas, P., Spector, T.D.: Human aging-associated dna hypermethylation occurs preferentially at bivalent chromatin domains. Genome Res. 20(4), 434–439 (2010)
Article Google Scholar
Rakyan, V.K., Down, T.A., Balding, D.J., Beck, S.: Epigenome-wide association studies for common human diseases. Nat. Rev. Genet. 12(8), 529–541 (2011)
Article Google Scholar
Rhodes, D.R., Chinnaiyan, A.M.: Integrative analysis of the cancer transcriptome. Nat. Genet. 37, S31–S37 (2005)
Article Google Scholar
Sainlez, M., Absil, P.-A., Teschendorff, A. Gene expression data analysis using spatiotemporal blind, source separation. In: Proceedings of ESANN’2009, pp. 159–164. (2009)
Google Scholar
Sawyers, C.L.: The cancer biomarker problem. Nature 452(7187), 548–552 (2008)
Article Google Scholar
Schmidt, M., Bhm, D., von Trne, C., Steiner, E., Puhl, A., Pilch, H., Lehr, H.A., Hengstler, J.G., Klbl, H., Gehrmann, M.: The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 68(13), 5405–5413 (2008)
Article Google Scholar
Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M.: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Natl. Cancer Inst. 98(4), 262–272 (2006)
Article Google Scholar
Stone, J.V., Porrill, J., Porter, N.R., Wilkinson, I.D.: Spatiotemporal independent component analysis of event-related fmri data using skewed probability density functions. Neuroimage 15 (2002)
Google Scholar
Storey, J.D., Tibshirani, R.: Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100(16), 9440–9445 (2003)
Article MATH MathSciNet Google Scholar
Subramanian, A,. Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102(43), 15, 545–15, 550 (2005)
Google Scholar
Swanton, C., Caldas, C.: From genomic landscapes to personalized cancer management-is there a roadmap? Ann. N. Y. Acad. Sci. 1210, 34–44 (2010)
Article Google Scholar
Teschendorff, A.E., Naderi, A., Barbosa-Morais, N.L., Caldas, C.: Pack: profile analysis using clustering and kurtosis to find molecular classifiers in cancer. Bioinformatics 22(18), 2269–2275 (2006)
Article Google Scholar
Teschendorff, A.E., Journe, M., Absil, P.A., Sepulchre, R., Caldas, C.: Elucidating the altered transcriptional programs in breast cancer using independent component analysis. PLoS Comput. Biol. 3(8), e161 (2007)
Article Google Scholar
Teschendorff, A.E., Menon, U., Gentry-Maharaj, A., Ramus, S.J., Gayther, S.A., Apostolidou, S., Jones, A., Lechner, M., Beck, S., Jacobs, I.J., Widschwendter, M.: An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS ONE 4(12), e8274 (2009)
Article Google Scholar
Teschendorff, A.E., Menon, U., Gentry-Maharaj, A., Ramus, S.J., Weisenberger, D.J., Shen, H., Campan, M., Noushmehr, H., Bell, C.G., Maxwell, A.P., Savage, D.A., Mueller-Holzner, E., Marth, C., Kocjan, G., Gayther, S.A., Jones, A., Beck, S., Wagner, W., Laird, P.W., Jacobs, I.J., Widschwendter, M.: Age-dependent dna methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 20(4), 440–446 (2010)
Article Google Scholar
Teschendorff, A.E., Zhuang, J., Widschwendter, M.: Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27(11), 1496–1505 (2011)
Article Google Scholar
The Cancer Genome Atlas Research Network: Integrated genomic analyses of ovarian carcinoma. Nature 474(7353), 609–615 (2011)
Google Scholar
Theis, F., Gruber, P., Keck, I., Meyer-Bäse, A., Lang, E.: Spatiotemporal blind source separation using double-sided approximate joint diagonalization. In: Proceedings of EUSIPCO 2005, Antalya, Turkey (2005)
Google Scholar
Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts, A.M., Look, M.P., Yang, F., Talantov, D., Timmermans, M., Yu, J., Jatkoe, T., Berns, E.M., Atkins, D., Foekens, J.A.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460), 671–679 (2005)
Article Google Scholar
Zhang, X.W., Yap, Y.L., Wei, D., Chen, F., Danchin, A.: Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis. Eur. J. Hum. Genet. 13(12), 1303–1311 (2005)
Article Google Scholar
Zhang, S., Liu, C.C., Li, W., Shen, H., Laird, P.W., Zhou, X.J.: Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 40(19), 9379–9391 (2012)
Article Google Scholar
Zhuang, J., Widschwendter, M., Teschendorff, A.E.: A comparison of feature selection and classification methods in dna methylation studies using the illumina infinium platform. BMC Bioinformatics 13, 59 (2012)
Article Google Scholar

Download references

Acknowledgments

AET was supported by a Heller Research Fellowship. This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Program initiated by the Belgian Science Policy Office.

Author information

Authors and Affiliations

Statistical Cancer Genomics, UCL Cancer Institute, 72 Huntley Street, London, WC1E 6BT, UK
Andrew E. Teschendorff
CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai Institute for Biological Sciences, 320 Yue Yang Road, Shanghai, 200031, China
Andrew E. Teschendorff
Department of Mathematical Engineering, ICTEAM Institute, Université catholique de Louvain, B-1348, Louvain-la-Neuve, Belgium
Emilie Renard & Pierre A. Absil

Authors

Andrew E. Teschendorff
View author publications
You can also search for this author in PubMed Google Scholar
Emilie Renard
View author publications
You can also search for this author in PubMed Google Scholar
Pierre A. Absil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew E. Teschendorff .

Editor information

Editors and Affiliations

University of Technology, Sydney, Sydney, Australia
Ganesh R. Naik
University of Surrey, Guildford, United Kingdom
Wenwu Wang

Appendix

1.1 Simulated Data

We simulated data matrices with 2,000 features and 50 samples and considered the case of two confounding factors (CFs) in addition to the primary phenotype of interest. The primary phenotype is a binary variable $I_1$ with 25 samples in one class ($I_1=0$) and the other half with $I_1=1$. Similarly, each confounding factor is assumed to be a binary variable affecting one half of the samples (randomly selected). For a given sample $s$ we thus have a 3-tuple of indicator variables $I_s=(I_{1s},I_{2s},I_{3s})$ where $I_2$ and $I_3$ are the indicators for the two confounding factors. Thus, samples fall into 8 classes. For instance, if $I_s=(0,0,0)$ then this sample belongs to phenotype class 1 and is not affected by the two confounding factors. Similarly, $I_s=(0,1,0)$ means that the sample belongs to class 1 and is affected by the first confounding factor but not the second.

We assume 10 % of features (200 features) to be TPs discriminating between the two phenotypic classes. We model the confounding factors as follows: each confounding factor is assumed to affect 10 % of features with a 25% overlap with the TPs (i.e 50 of the 200 TPs are confounded by each factor). Let $J_g$ denote the indicator variable of feature $g$, so $J_g$ is a 3-tuple $(J_{1g},J_{2g},J_{3g})$ with $J_{1g}$ an indicator for the feature to be a true positive, and $J_{2g}$ ($J_{3g}$) an indicator for the feature to be affected by the first (second) confounding factor. Thus, the space of features is also divided into eight groups. Furthermore, let $(e_1,e_2,e_3)$ denote the effect sizes of the primary variable and the two confounding factors respectively, where we assume for simplicity that $e_2=e_3$. Without loss of generality, we further assume that noise is modeled by a Gaussian of mean zero and unit variance $N(0,1)$. Thus, for a given sample $s$ we draw data values for the various feature groups as follows:

1.
$J_g=(0,0,0)$: null unaffected features
$$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,000}N(0,1) \\ \end{aligned}$$
2.
$J_g=(0,1,0)$ or $(0,0,1)$: null features affected by only one CF
$$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,010}\bigl \{\delta _{I_s,x1z}N(e_2,1) \\&\quad + \delta _{I_{s},x0z}N(0,1)\bigr \} \\&\quad + \delta _{J_g,001}\bigl \{\delta _{I_{s},xy1}N(e_3,1) \\&\quad + \delta _{I_{s},xy0}N(0,1) \bigr \} \\ \end{aligned}$$
3.
$J_g=(0,1,1)$: null features affected by the two CFs
$$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,011}\bigl \{\delta _{I_{s},x11}N(e_2+e_3,1) \\&\quad + \delta _{I_{s},x01}N(e_3,1) \\&\quad + \delta _{I_{s},x10}N(e_2,1) \\&\quad + \delta _{I_{s},x00}N(0,1)\bigr \} \\ \end{aligned}$$
4.
$J_g=(1,0,0)$: true positives not affected by CFs
$$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,100}\bigl \{\delta _{I_{s},0yz}N(0,1) \\&\quad + \delta _{I_s,1yz}(\pi _{-1}N(-e_1,1)+\pi _1N(e_1,1))\bigr \} \\ \end{aligned}$$
5.
$J_g=(1,0,1)$ or $(1,1,0)$: true positives affected by one CF
$$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,101}\bigl \{\delta _{I_{s},0y0}N(0,1)+\delta _{I_s,0y1}N(e_3,1) \\&\quad + \delta _{I_s,1y0}(\pi _{-1}N(-e_1,1)+\pi _1N(e_1,1)) \\&\quad + \delta _{I_s,1y1}(\pi _{-1}N(-e_1+e_3,1) \\&\quad +\pi _1N(e_1+e_3,1))\bigr \} \\&\sim \delta _{J_g,110}\bigl \{\delta _{I_s,00z}N(0,1)+\delta _{I_s,01z}N(e_2,1) \\&\quad + \delta _{I_s,10z}(\pi _{-1}N(-e_1,1)+\pi _1N(e_1,1)) \\&\quad + \delta _{I_s,11z}(\pi _{-1}N(-e_1+e_2,1) \\&\quad +\pi _1N(e_1+e_2,1))\bigr \} \\ \end{aligned}$$
6.
$J_g=(1,1,1)$: true positives affected by all CFs
$$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,111}\bigl \{ \delta _{I_s,000}N(0,1) \\&\quad + \delta _{I_s,010}N(e_2,1) + \delta _{I_s,001}N(e_3,1) \\&\quad + \delta _{I_s,011}N(e_2+e_3,1) \\&\quad + \delta _{I_s,101}(\pi _{-1}N(-e_1+e_3,1)\\&\quad +\pi _1N(e_1+e_3,1)) \\&\quad + \delta _{I_s,110}(\pi _{-1}N(-e_1+e_2,1)\\&\quad +\pi _1N(e_1+e_2,1)) \\&\quad + \delta _{I_s,111}(\pi _{-1}N(-e_1+e_2+e_3,1)\\&\quad +\pi _1N(e_1+e_2+e_3,1))\bigr \} \\ \end{aligned}$$

where in the above $\delta _{x'y'z',xyz}$ denotes the triple Kronecker delta: $\delta _{x^{\prime }y^{\prime }z^{\prime },xyz}=1$ if and only if $x'=x$, $y^{\prime }=y$ and $z^{\prime }=z$, otherwise $\delta _{x^{\prime }y^{\prime }z^{\prime },xyz}=0$, and $(\pi _{-1},\pi _{1})$ are weights satisfying $\pi _{-1}+\pi _1=1$. In our case, we used $\pi _1=\pi _{-1}=0.5$.

1.2 DNA Methylation Data (Whole Blood Tissue)

In all datasets, age is the phenotype of interest. (i) T1D: this DNAm dataset consists of 187 blood samples from patients (94 women and 93 men) with type-1 diabetes. This set served as validation for a DNAm signature for aging [44]. We take BSCE, beadchip, cohort, and sex as potential confounding factors. Samples were distributed over 17 beadchips; (ii) UKOPS1: this DNAm set consists of 108 blood samples from healthy postmenopausal women which served as controls for the UKOPS study [43]. Confounding factors in this study include BSCE, beadchip and DNA concentration (DNAc). Samples were distributed over 10 beadchips; (iii) UKOPS2: This is similar to Dataset2 but consists of 145 blood samples from healthy postmenopausal women distributed over 36 beadchips (i.e., approximately four healthy samples per chip, the other eight blood samples per chip were from cancer cases) [43]; (iv) WBBC: This dataset consists of whole blood samples from a total of 84 women (49 healthy and 35 women with breast cancer). Samples were distributed over seven beadchips, and confounders are BSCE, status (cancer/healthy), and beadchip.

1.3 Breast Cancer mRNA Expression Data

The mRNA expression profiles are all from primary breast cancers and three of the datasets were profiled on Affymetrix platforms, while another was profiled on an Illumina Beadchip. Normalized data were downloaded from GEO (http://ncbi.nlm.nih.gov/), and probes mapping to the same Entrez ID identifier were averaged. Sotiriou: 14,223 genes and 101 samples [36]; Loi: 15,736 genes and 137 samples [26]; Schmidt: 13,292 genes and 200 samples [35]; Blenkiron: 17,941 genes and 128 samples [5]. In these datasets, we take histological grade as the phenotype of interest and consider estrogen receptor status and tumor size as potential confounders. Cell-cycle-related genes are known to discriminate low and high grade breast cancers irrespective of estrogen receptor status [26, 36]. Therefore, we compare the algorithms in their ability to detect specifically cell-cycle-related genes and not estrogen-regulated genes. To this end, we focused attention on two gene sets, one representing cell-cycle-related genes from the Reactome http://www.reactome.org, and another representing estrogen receptor (ESR1) upregulated genes [10]. The cell-cycle set showed negligible overlap with the ESR1 gene set, however, we removed the few overlapping genes to ensure mutual exclusivity of the cell-cycle and ESR1 sets.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Teschendorff, A.E., Renard, E., Absil, P.A. (2014). Supervised Normalization of Large-Scale Omic Datasets Using Blind Source Separation. In: Naik, G., Wang, W. (eds) Blind Source Separation. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55016-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-55016-4_17
Published: 22 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55015-7
Online ISBN: 978-3-642-55016-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Simulated Data

1.2 DNA Methylation Data (Whole Blood Tissue)

1.3 Breast Cancer mRNA Expression Data

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation