Skip to main content

Integrating Heterogeneous Datasets for Cancer Module Identification

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

Abstract

The availability of multiple heterogeneous high-throughput datasets provides an enabling resource for cancer systems biology. Types of data include: Gene expression (GE), copy number aberration (CNA), miRNA expression, methylation, and protein–protein Interactions (PPI). One important problem that can potentially be solved using such data is to determine which of the possible pair-wise interactions among genes contributes to a range of cancer-related events, from tumorigenesis to metastasis. It has been shown by various studies that applying integrated knowledge from multi-omics datasets elucidates such complex phenomena with higher statistical significance than using a single type of dataset individually. However, computational methods for processing multiple data types simultaneously are needed. This chapter reviews some of the computational methods that use integrated approaches to find cancer-related modules.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Zhang S, Liu CC, Li W, Shen H, Laird PW, et al (2012) Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res 40:9379–9391

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Davies H, Bignell GR, Cox C, Stephens P, Edkins S, et al (2002) Mutations of the BRAF gene in human cancer. Nature 417:949–954

    Article  CAS  PubMed  Google Scholar 

  3. Wan PT, Garnett MJ, Roe SM, Lee S, Niculescu-Duvaz D, et al (2004) Mechanism of activation of the RAF-ERK signaling pathway by oncogenic mutations of B-RAF. Cell 116:855–867

    Article  CAS  PubMed  Google Scholar 

  4. Santarosa M, Ashworth A (2004) Haploinsufficiency for tumour suppressor genes: when you don’t need to go all the way. Biochim Biophys Acta 1654:105–122

    CAS  PubMed  Google Scholar 

  5. Vogelstein B, Kinzler KW (2004) Cancer genes and the pathways they control. Nat Med 10:789–799

    Article  CAS  PubMed  Google Scholar 

  6. Hanahan D, Weinberg R (2011) Hallmarks of cancer: the next generation. Cell 144:646–674

    Article  CAS  PubMed  Google Scholar 

  7. Jonsson PF, Bates PA (2006) Global topological features of cancer proteins in the human interactome. Bioinformatics 22:2291–2297

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Qiu YQ, Zhang S, Zhang XS, Chen L (2010) Detecting disease associated modules and prioritizing active genes based on high throughput data. BMC Bioinf 11:26

    Article  Google Scholar 

  9. de Lichtenberg U, Jensen LJ, Brunak S, Bork P (2005) Dynamic complex formation during the yeast cell cycle. Science 307:724–727

    Article  PubMed  Google Scholar 

  10. Segal E, Shapira M, Regev A, Pe’er D, Botstein D, et al (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34:166–176

    Article  CAS  PubMed  Google Scholar 

  11. Subramanian A, Tamayo P, Mootha VK, et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102:15545–15550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Liu X, Liu ZP, Zhao XM, Chen L (2012) Identifying disease genes and module biomarkers by differential interactions. J Am Med Inform Assoc 19:241–248

    Article  PubMed  Google Scholar 

  13. Wen Z, Liu ZP, Yan Y, Piao G, Liu Z, et al (2012) Identifying responsive modules by mathematical programming: an application to budding yeast cell cycle. PLoS One 7:e41854

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. He D, Liu ZP, Honda M, Kaneko S, Chen L (2012) Coexpression network analysis in chronic hepatitis B and C hepatic lesions reveals distinct patterns of disease progression to hepatocellular carcinoma. J Mol Cell Biol 4:140–152

    Article  PubMed  Google Scholar 

  15. Nepusz T, Yu H, Paccanaro A (2012) Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods 9:471–472

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Iorns E, Lord CJ, Turner N, Ashworth A (2007) Utilizing RNA interference to enhance cancer drug discovery. Nat Rev Drug Discov 6:556–568

    Article  CAS  PubMed  Google Scholar 

  17. Azad AKM, Lee H (2013) Voting-based cancer module identification by combining topological and data-driven properties. PLoS One 8:e70498

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Beroukhim R, Getz G, Nghiemphu L, Barretina J, Hsueh T, et al (2007) Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc Natl Acad Sci 104:20007–20012

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Taylor BS, Barretina J, Socci ND, DeCarolis P, Ladanyi M, et al (2008) Functional copy-number alterations in cancer. PLoS One 3:e3179

    Article  PubMed  PubMed Central  Google Scholar 

  20. Hur Y, Lee H (2011) Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays. BMC Bioinf 12:146

    Article  Google Scholar 

  21. Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, et al (2010) An integrated approach to uncover drivers of cancer. Cell 143:1005–1017

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Jornsten R, Abenius T, Kling T, Schmidt L, Johansson E, et al (2011) Network modeling of the transcriptional effects of copy number aberrations in glioblastoma. Mol Syst Biol 7:486

    Article  PubMed  PubMed Central  Google Scholar 

  23. Schadt EE, Lamb J, Yang X (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37:710–717

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Lee H, Kong SW, Park PJ (2008) Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes. Bioinformatics 24:889–896

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. TCGA (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455:1061–1068

    Article  Google Scholar 

  26. TCGA (2011) Integrated genomic analyses of ovarian carcinoma. Nature 474:609–615

    Article  Google Scholar 

  27. Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: Ncbi gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. The cancer genome atlas - data portal (2005) https://tcga-data.nci.nih.gov/tcga

  29. Herrero J, Diaz-Uriarte R, Dopazo J (2003) Gene expression data preprocessing. Bioinformatics 19:655–656

    Article  CAS  PubMed  Google Scholar 

  30. van de Wiel MA, Picard F, van Wieringen WN, Ylstra B (2011) Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief Bioinform 12:10–21

    Article  PubMed  Google Scholar 

  31. Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, et al (2010) Comparison of beta-value and m-value methods for quantifying methylation levels by microarray analysis. BMC Bioinf 11:1–9

    Article  Google Scholar 

  32. Zhang J, Zhang S, Wang Y, Zhang XS (2013) Identification of mutated core cancer modules by integrating somatic mutation, copy number variation, and gene expression data. BMC Syst Biol 7:S4

    Article  PubMed  PubMed Central  Google Scholar 

  33. Wang Y, Xia Y (2008) Condition specific sub-network identification using an optimization model. Proc Sec Int Symp Opt Syst Biol. http://www.aporc.org/LNOR/9/OSB2008F42.pdf

  34. Wen Z, Liu ZP, Liu Z, Zhang Y, Chen L (2013) An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer. J Am Med Inform Assoc 20:659–667

    Article  PubMed  PubMed Central  Google Scholar 

  35. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, et al (2004) A census of human cancer genes. Nat Rev Cancer 4:177–183

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Kim YA, Wuchty S, Przytycka TM (2011) Identifying causal genes and dysregulated pathways in complex diseases. PLoS Comput Biol 7:e1001095

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hinoue T, Weisenberger DJ, Lange CP, Shen H, Byun HM, et al (2012) Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Res 22:271–282

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou L, et al (2009) Genome-wide DNA methylation profiling using Infinium assay. Epigenomics 1:177–200

    Article  CAS  PubMed  Google Scholar 

  39. Peri S, Navarro JD, Kristiansen TZ, Amanchy R, et al (2004) Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res 32:497–501

    Article  Google Scholar 

  40. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, et al (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34:D535–D539

    Article  CAS  PubMed  Google Scholar 

  41. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, et al (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res 32:D452–D455

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ceol A, Chatr Aryamontri A, Licata L, Peluso D, Briganti L, et al (2010) MINT, the molecular interaction database: 2009 update. Nucleic Acids Res 38:D532–D539

    Article  CAS  PubMed  Google Scholar 

  43. Matthews L, Gopinath G, Gillespie M, Caudy M, Croft D, et al (2009) Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res 37:D619–D622

    Article  CAS  PubMed  Google Scholar 

  44. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Cerami E, Demir E, Schultz N, Taylor BS, Sander C (2010) Automated network analysis identifies core pathways in glioblastoma. PLoS One 5:e8918

    Article  PubMed  PubMed Central  Google Scholar 

  46. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57:289–300

    Google Scholar 

  47. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113

    Article  CAS  Google Scholar 

  48. Zhang A (2009) Modularity analysis of protein interaction networks. In: Zhang A (ed) Protein interaction networks: computational analysis, 1st edn. Cambridge University Press, Cambridge

    Google Scholar 

  49. R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN 3-900051-07-0

    Google Scholar 

  50. Vandin F, Upfal E, Raphael BJ (2012) De novo discovery of mutated driver pathways in cancer. Genome Res 22:375–385

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Zhao J, Zhang S, Wu LY, Zhang XS (2012) Efficient methods for identifying mutated driver pathways in cancer. Bioinformatics 28:2940–2947

    Article  CAS  PubMed  Google Scholar 

  52. Miller CA, Settle SH, Sulman EP, Aldape KD, Milosavljevic A (2011) Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors. BMC Med Genomics 4:34

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Azad, A.K.M. (2017). Integrating Heterogeneous Datasets for Cancer Module Identification. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6613-4_7

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6611-0

  • Online ISBN: 978-1-4939-6613-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics