Skip to main content

Supervised Normalization of Large-Scale Omic Datasets Using Blind Source Separation

  • Chapter
  • First Online:
  • 2805 Accesses

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Biotechnological advances in genomics have heralded in a new era of quantitative molecular biology whereby it is now possible to routinely measure over tens of thousands of molecular features (e.g., gene expression levels) in hundreds if not thousands of patient samples. A key statistical challenge in the analysis of such large omic datasets is the presence of confounding sources of variation, which are often either unknown or only known with error. In this chapter, we present a supervised normalization method in which Blind Source Separation (BSS) is applied to identify the sources of variation, and demonstrate that this leads to improved statistical inference in subsequent supervised analyses. The statistical framework presented here will be of interest to biologists, bioinformaticians and signal processing experts alike.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Instead of the residual variation matrix \(R\) which requires specification of the POI and is thus supervised.

References

  1. Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J., Stratton, M.R.: Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3(1), 246–259 (2013)

    Google Scholar 

  2. Baufays, H.: Unification de techniques de sparation aveugle de sources avec application l’analyse de l’expression des gnes. Ecole Polytechnique de Louvain, Master thesis with Prof. P.-A. Absil (2011)

    Google Scholar 

  3. Bell, C.G., Teschendorff, A.E., Rakyan, V.K., Maxwell, A.P., Beck, S., Savage, D.A.: Genome-wide dna methylation analysis for diabetic nephropathy in type 1 diabetes mellitus. BMC Med. Genomics 3, 33 (2010)

    Article  Google Scholar 

  4. Bibikova, M., Le, J., Barnes, B., Saedinia-Melnyk, S., Zhou, L., Shen, R., Gunderson, K.L.: Genome-wide DNA methylation profiling using the infinium assay. Epigenomics 1(1), 177–200 (2009)

    Google Scholar 

  5. Blenkiron, C., Goldstein, L.D., Thorne, N.P., Spiteri, I., Chin, S.F., Dunning, M.J., Barbosa-Morais, N.L., Teschendorff, A.E., Green, A.R., Ellis, I.O., Tavar, S., Caldas, C., Miska, E.A.: Microrna expression profiling of human breast cancer identifies new markers of tumor subtype. Genome Biol. 8(10), R214 (2007)

    Article  Google Scholar 

  6. Cardoso, J.F.: High-order contrasts for independent component analysis. Neural Comput. 11(1), 157–192 (1999)

    Article  MathSciNet  Google Scholar 

  7. Consortium 1000 Genomes Project, Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., McVean, G.A.: An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422), 56–65 (2012)

    Google Scholar 

  8. Curtis, C., Shah, S.P., Chin, S.F., Turashvili, G., Rueda, O.M., Dunning, M.J., Speed, D., Lynch, A.G., Samarajiwa, S., Yuan, Y., Grf, S., Ha, G., Haffari, G., Bashashati, A., Russell, R., McKinney, S., Watson, P., Markowetz, F., Murphy, L., Ellis, I., Purushotham, A., Brresen-Dale, A.L., Brenton, J.D., Tavar, S., Caldas, C., Aparicio, S.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403), 346–352 (2012)

    Google Scholar 

  9. Deaton, A.M., Bird, A.: Cpg islands and the regulation of transcription. Genes Dev. 25, 1010–1022 (2011)

    Article  Google Scholar 

  10. Doane, A.S., Danso, M., Lal, P., Donaton, M., Zhang, L., Hudis, C., Gerald, W.L.: An estrogen receptor-negative breast cancer subset characterized by a hormonally regulated transcriptional program and response to androgen. Oncogene 25(28), 3994–4008 (2006)

    Article  Google Scholar 

  11. Feinberg, A.P., Vogelstein, B.: Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 301(5895), 89–92 (1983)

    Article  Google Scholar 

  12. Frigyesi, A., Veerla, S., Lindgren, D., Hoglund, M.: Independent component analysis reveals new and biologically significant structures in micro array data. BMC Bioinformatics 7, 290 (2006)

    Article  Google Scholar 

  13. Gao, Y., Church, G.: Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics 21(21), 3970–3975 (2005)

    Article  Google Scholar 

  14. Huang, D.S., Zheng, C.H.: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15), 1855–1862 (2006)

    Article  Google Scholar 

  15. Hyvaerinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)

    Google Scholar 

  16. Johnson, W.E., Li, C., Rabinovic, A.: Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics 8(1), 118–127 (2007)

    Article  MATH  Google Scholar 

  17. Jones, P.A., Baylin, S.B.: The epigenomics of cancer. Cell 128(4), 683–692 (2007)

    Article  Google Scholar 

  18. Lee, S.I., Batzoglou, S.: Application of independent component analysis to microarrays. Genome Biol. 4(11), R76 (2003)

    Article  Google Scholar 

  19. Leek, J.T., Storey, J.D.: A general framework for multiple testing dependence. Proc. Natl. Acad. Sci. USA 105(48), 18, 718–18, 723 (2008)

    Google Scholar 

  20. Leek, J.T., Storey, J.D.: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3(9), 1724–1735 (2007)

    Article  Google Scholar 

  21. Leek, J.T., Scharpf, R.B., Bravo, H.C., Simcha, D., Langmead, B., Johnson, W.E., Geman, D., Baggerly, K., Irizarry, R.A.: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11(10), 733–739 (2010)

    Article  Google Scholar 

  22. Liao, J.C., Boscolo, R., Yang, Y.L., Tran, L.M., Sabatti, C., Roychowdhury, V.P.: Network component analysis: reconstruction of regulatory signals in biological systems. Proc. Natl. Acad. Sci. USA 100(26), 15,522–15,527 (2003)

    Google Scholar 

  23. Liebermeister, W.: Linear modes of gene expression determined by independent component analysis. Bioinformatics 18(1), 51–60 (2002)

    Article  Google Scholar 

  24. Liu, Y., Aryee, M.J., Padyukov, L., Fallin, M.D., Hesselberg, E., Runarsson, A., Reinius, L., Acevedo, N., Taub, M., Ronninger, M., Shchetynsky, K., Scheynius, A., Kere, J., Alfredsson, L., Klareskog, L., Ekstrm, T.J., Feinberg, A.P.: Epigenome-wide association data implicate dna methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat. Biotechnol. 31(2), 142–147 (2013)

    Google Scholar 

  25. Liu, N.W., Sanford, T., Srinivasan, R., Liu, J.L., Khurana, K., Aprelikova, O., Valero, V., Bechert, C., Worrell, R., Pinto, P.A., Yang, Y., Merino, M., Linehan, W.M., Bratslavsky, G.: Impact of ischemia and procurement conditions on gene expression in renal cell carcinoma. Clin. Cancer Res. 19(1), 42–49 (2013)

    Google Scholar 

  26. Loi, S., Haibe-Kains, B., Desmedt, C., Lallemand, F., Tutt, A.M., Gillet, C., Ellis, P., Harris, A., Bergh, J., Foekens, J.A., Klijn, J.G., Larsimont, D., Buyse, M., Bontempi, G., Delorenzi, M., Piccart, M.J., Sotiriou, C.: Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J. Clin. Oncol. 25(10), 1239–1246 (2007)

    Article  Google Scholar 

  27. Maegawa, S., Hinkal, G., Kim, H.S., Shen, L., Zhang, L., Zhang, J., Zhang, N., Liang, S., Donehower, L.A., Issa, J.P.: Widespread and tissue specific age-related dna methylation changes in mice. Genome Res. 20(3), 332–340 (2010)

    Article  Google Scholar 

  28. Martoglio, A.M., Miskin, J.W., Smith, S.K., MacKay, D.J.: A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer. Bioinformatics 18(12), 1617–1624 (2002)

    Article  Google Scholar 

  29. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A., Guhr, T., Stanley, H.E.: Random matrix approach to cross correlations in financial data. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 65(6), 066,126 (2002)

    Google Scholar 

  30. Rakyan, V.K., Down, T.A., Maslau, S., Andrew, T., Yang, T.P., Beyan, H., Whittaker, P., McCann, O.T., Finer, S., Valdes, A.M., Leslie, R.D., Deloukas, P., Spector, T.D.: Human aging-associated dna hypermethylation occurs preferentially at bivalent chromatin domains. Genome Res. 20(4), 434–439 (2010)

    Article  Google Scholar 

  31. Rakyan, V.K., Down, T.A., Balding, D.J., Beck, S.: Epigenome-wide association studies for common human diseases. Nat. Rev. Genet. 12(8), 529–541 (2011)

    Article  Google Scholar 

  32. Rhodes, D.R., Chinnaiyan, A.M.: Integrative analysis of the cancer transcriptome. Nat. Genet. 37, S31–S37 (2005)

    Article  Google Scholar 

  33. Sainlez, M., Absil, P.-A., Teschendorff, A. Gene expression data analysis using spatiotemporal blind, source separation. In: Proceedings of ESANN’2009, pp. 159–164. (2009)

    Google Scholar 

  34. Sawyers, C.L.: The cancer biomarker problem. Nature 452(7187), 548–552 (2008)

    Article  Google Scholar 

  35. Schmidt, M., Bhm, D., von Trne, C., Steiner, E., Puhl, A., Pilch, H., Lehr, H.A., Hengstler, J.G., Klbl, H., Gehrmann, M.: The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 68(13), 5405–5413 (2008)

    Article  Google Scholar 

  36. Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M.J., Bergh, J., Piccart, M., Delorenzi, M.: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Natl. Cancer Inst. 98(4), 262–272 (2006)

    Article  Google Scholar 

  37. Stone, J.V., Porrill, J., Porter, N.R., Wilkinson, I.D.: Spatiotemporal independent component analysis of event-related fmri data using skewed probability density functions. Neuroimage 15 (2002)

    Google Scholar 

  38. Storey, J.D., Tibshirani, R.: Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100(16), 9440–9445 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  39. Subramanian, A,. Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102(43), 15, 545–15, 550 (2005)

    Google Scholar 

  40. Swanton, C., Caldas, C.: From genomic landscapes to personalized cancer management-is there a roadmap? Ann. N. Y. Acad. Sci. 1210, 34–44 (2010)

    Article  Google Scholar 

  41. Teschendorff, A.E., Naderi, A., Barbosa-Morais, N.L., Caldas, C.: Pack: profile analysis using clustering and kurtosis to find molecular classifiers in cancer. Bioinformatics 22(18), 2269–2275 (2006)

    Article  Google Scholar 

  42. Teschendorff, A.E., Journe, M., Absil, P.A., Sepulchre, R., Caldas, C.: Elucidating the altered transcriptional programs in breast cancer using independent component analysis. PLoS Comput. Biol. 3(8), e161 (2007)

    Article  Google Scholar 

  43. Teschendorff, A.E., Menon, U., Gentry-Maharaj, A., Ramus, S.J., Gayther, S.A., Apostolidou, S., Jones, A., Lechner, M., Beck, S., Jacobs, I.J., Widschwendter, M.: An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS ONE 4(12), e8274 (2009)

    Article  Google Scholar 

  44. Teschendorff, A.E., Menon, U., Gentry-Maharaj, A., Ramus, S.J., Weisenberger, D.J., Shen, H., Campan, M., Noushmehr, H., Bell, C.G., Maxwell, A.P., Savage, D.A., Mueller-Holzner, E., Marth, C., Kocjan, G., Gayther, S.A., Jones, A., Beck, S., Wagner, W., Laird, P.W., Jacobs, I.J., Widschwendter, M.: Age-dependent dna methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 20(4), 440–446 (2010)

    Article  Google Scholar 

  45. Teschendorff, A.E., Zhuang, J., Widschwendter, M.: Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27(11), 1496–1505 (2011)

    Article  Google Scholar 

  46. The Cancer Genome Atlas Research Network: Integrated genomic analyses of ovarian carcinoma. Nature 474(7353), 609–615 (2011)

    Google Scholar 

  47. Theis, F., Gruber, P., Keck, I., Meyer-Bäse, A., Lang, E.: Spatiotemporal blind source separation using double-sided approximate joint diagonalization. In: Proceedings of EUSIPCO 2005, Antalya, Turkey (2005)

    Google Scholar 

  48. Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts, A.M., Look, M.P., Yang, F., Talantov, D., Timmermans, M., Yu, J., Jatkoe, T., Berns, E.M., Atkins, D., Foekens, J.A.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460), 671–679 (2005)

    Article  Google Scholar 

  49. Zhang, X.W., Yap, Y.L., Wei, D., Chen, F., Danchin, A.: Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis. Eur. J. Hum. Genet. 13(12), 1303–1311 (2005)

    Article  Google Scholar 

  50. Zhang, S., Liu, C.C., Li, W., Shen, H., Laird, P.W., Zhou, X.J.: Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 40(19), 9379–9391 (2012)

    Article  Google Scholar 

  51. Zhuang, J., Widschwendter, M., Teschendorff, A.E.: A comparison of feature selection and classification methods in dna methylation studies using the illumina infinium platform. BMC Bioinformatics 13, 59 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

AET was supported by a Heller Research Fellowship. This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Program initiated by the Belgian Science Policy Office.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew E. Teschendorff .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Simulated Data

We simulated data matrices with 2,000 features and 50 samples and considered the case of two confounding factors (CFs) in addition to the primary phenotype of interest. The primary phenotype is a binary variable \(I_1\) with 25 samples in one class (\(I_1=0\)) and the other half with \(I_1=1\). Similarly, each confounding factor is assumed to be a binary variable affecting one half of the samples (randomly selected). For a given sample \(s\) we thus have a 3-tuple of indicator variables \(I_s=(I_{1s},I_{2s},I_{3s})\) where \(I_2\) and \(I_3\) are the indicators for the two confounding factors. Thus, samples fall into 8 classes. For instance, if \(I_s=(0,0,0)\) then this sample belongs to phenotype class 1 and is not affected by the two confounding factors. Similarly, \(I_s=(0,1,0)\) means that the sample belongs to class 1 and is affected by the first confounding factor but not the second.

We assume 10 % of features (200 features) to be TPs discriminating between the two phenotypic classes. We model the confounding factors as follows: each confounding factor is assumed to affect 10 % of features with a 25% overlap with the TPs (i.e 50 of the 200 TPs are confounded by each factor). Let \(J_g\) denote the indicator variable of feature \(g\), so \(J_g\) is a 3-tuple \((J_{1g},J_{2g},J_{3g})\) with \(J_{1g}\) an indicator for the feature to be a true positive, and \(J_{2g}\) (\(J_{3g}\)) an indicator for the feature to be affected by the first (second) confounding factor. Thus, the space of features is also divided into eight groups. Furthermore, let \((e_1,e_2,e_3)\) denote the effect sizes of the primary variable and the two confounding factors respectively, where we assume for simplicity that \(e_2=e_3\). Without loss of generality, we further assume that noise is modeled by a Gaussian of mean zero and unit variance \(N(0,1)\). Thus, for a given sample \(s\) we draw data values for the various feature groups as follows:

  1. 1.

    \(J_g=(0,0,0)\): null unaffected features

    $$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,000}N(0,1) \\ \end{aligned}$$
  2. 2.

    \(J_g=(0,1,0)\) or \((0,0,1)\): null features affected by only one CF

    $$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,010}\bigl \{\delta _{I_s,x1z}N(e_2,1) \\&\quad + \delta _{I_{s},x0z}N(0,1)\bigr \} \\&\quad + \delta _{J_g,001}\bigl \{\delta _{I_{s},xy1}N(e_3,1) \\&\quad + \delta _{I_{s},xy0}N(0,1) \bigr \} \\ \end{aligned}$$
  3. 3.

    \(J_g=(0,1,1)\): null features affected by the two CFs

    $$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,011}\bigl \{\delta _{I_{s},x11}N(e_2+e_3,1) \\&\quad + \delta _{I_{s},x01}N(e_3,1) \\&\quad + \delta _{I_{s},x10}N(e_2,1) \\&\quad + \delta _{I_{s},x00}N(0,1)\bigr \} \\ \end{aligned}$$
  4. 4.

    \(J_g=(1,0,0)\): true positives not affected by CFs

    $$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,100}\bigl \{\delta _{I_{s},0yz}N(0,1) \\&\quad + \delta _{I_s,1yz}(\pi _{-1}N(-e_1,1)+\pi _1N(e_1,1))\bigr \} \\ \end{aligned}$$
  5. 5.

    \(J_g=(1,0,1)\) or \((1,1,0)\): true positives affected by one CF

    $$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,101}\bigl \{\delta _{I_{s},0y0}N(0,1)+\delta _{I_s,0y1}N(e_3,1) \\&\quad + \delta _{I_s,1y0}(\pi _{-1}N(-e_1,1)+\pi _1N(e_1,1)) \\&\quad + \delta _{I_s,1y1}(\pi _{-1}N(-e_1+e_3,1) \\&\quad +\pi _1N(e_1+e_3,1))\bigr \} \\&\sim \delta _{J_g,110}\bigl \{\delta _{I_s,00z}N(0,1)+\delta _{I_s,01z}N(e_2,1) \\&\quad + \delta _{I_s,10z}(\pi _{-1}N(-e_1,1)+\pi _1N(e_1,1)) \\&\quad + \delta _{I_s,11z}(\pi _{-1}N(-e_1+e_2,1) \\&\quad +\pi _1N(e_1+e_2,1))\bigr \} \\ \end{aligned}$$
  6. 6.

    \(J_g=(1,1,1)\): true positives affected by all CFs

    $$\begin{aligned} p(x|I_s)&\sim \delta _{J_g,111}\bigl \{ \delta _{I_s,000}N(0,1) \\&\quad + \delta _{I_s,010}N(e_2,1) + \delta _{I_s,001}N(e_3,1) \\&\quad + \delta _{I_s,011}N(e_2+e_3,1) \\&\quad + \delta _{I_s,101}(\pi _{-1}N(-e_1+e_3,1)\\&\quad +\pi _1N(e_1+e_3,1)) \\&\quad + \delta _{I_s,110}(\pi _{-1}N(-e_1+e_2,1)\\&\quad +\pi _1N(e_1+e_2,1)) \\&\quad + \delta _{I_s,111}(\pi _{-1}N(-e_1+e_2+e_3,1)\\&\quad +\pi _1N(e_1+e_2+e_3,1))\bigr \} \\ \end{aligned}$$

where in the above \(\delta _{x'y'z',xyz}\) denotes the triple Kronecker delta: \(\delta _{x^{\prime }y^{\prime }z^{\prime },xyz}=1\) if and only if \(x'=x\), \(y^{\prime }=y\) and \(z^{\prime }=z\), otherwise \(\delta _{x^{\prime }y^{\prime }z^{\prime },xyz}=0\), and \((\pi _{-1},\pi _{1})\) are weights satisfying \(\pi _{-1}+\pi _1=1\). In our case, we used \(\pi _1=\pi _{-1}=0.5\).

1.2 DNA Methylation Data (Whole Blood Tissue)

In all datasets, age is the phenotype of interest. (i) T1D: this DNAm dataset consists of 187 blood samples from patients (94 women and 93 men) with type-1 diabetes. This set served as validation for a DNAm signature for aging [44]. We take BSCE, beadchip, cohort, and sex as potential confounding factors. Samples were distributed over 17 beadchips; (ii) UKOPS1: this DNAm set consists of 108 blood samples from healthy postmenopausal women which served as controls for the UKOPS study [43]. Confounding factors in this study include BSCE, beadchip and DNA concentration (DNAc). Samples were distributed over 10 beadchips; (iii) UKOPS2: This is similar to Dataset2 but consists of 145 blood samples from healthy postmenopausal women distributed over 36 beadchips (i.e., approximately four healthy samples per chip, the other eight blood samples per chip were from cancer cases) [43]; (iv) WBBC: This dataset consists of whole blood samples from a total of 84 women (49 healthy and 35 women with breast cancer). Samples were distributed over seven beadchips, and confounders are BSCE, status (cancer/healthy), and beadchip.

1.3 Breast Cancer mRNA Expression Data

The mRNA expression profiles are all from primary breast cancers and three of the datasets were profiled on Affymetrix platforms, while another was profiled on an Illumina Beadchip. Normalized data were downloaded from GEO (http://ncbi.nlm.nih.gov/), and probes mapping to the same Entrez ID identifier were averaged. Sotiriou: 14,223 genes and 101 samples [36]; Loi: 15,736 genes and 137 samples [26]; Schmidt: 13,292 genes and 200 samples [35]; Blenkiron: 17,941 genes and 128 samples [5]. In these datasets, we take histological grade as the phenotype of interest and consider estrogen receptor status and tumor size as potential confounders. Cell-cycle-related genes are known to discriminate low and high grade breast cancers irrespective of estrogen receptor status [26, 36]. Therefore, we compare the algorithms in their ability to detect specifically cell-cycle-related genes and not estrogen-regulated genes. To this end, we focused attention on two gene sets, one representing cell-cycle-related genes from the Reactome http://www.reactome.org, and another representing estrogen receptor (ESR1) upregulated genes [10]. The cell-cycle set showed negligible overlap with the ESR1 gene set, however, we removed the few overlapping genes to ensure mutual exclusivity of the cell-cycle and ESR1 sets.

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Teschendorff, A.E., Renard, E., Absil, P.A. (2014). Supervised Normalization of Large-Scale Omic Datasets Using Blind Source Separation. In: Naik, G., Wang, W. (eds) Blind Source Separation. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55016-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55016-4_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55015-7

  • Online ISBN: 978-3-642-55016-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics