A Primer for Access to Repositories of Cancer-Related Genomic Big Data

  • John Torcivia-RodriguezEmail author
  • Hayley Dingerdissen
  • Ting-Chia Chang
  • Raja Mazumder
Part of the Methods in Molecular Biology book series (MIMB, volume 1878)


The use of large datasets has become ubiquitous in biomedical sciences. Researchers in the field of cancer genomics have, in recent years, generated large volumes of data from their experiments. Those responsible for production of this data often analyze a narrow subset of this data based on the research question they are trying to address: this is the case whether or not they are acting independently or in conjunction with a large-scale cancer genomics project. The reality of this situation creates the opportunity for other researchers to repurpose this data for different hypotheses if the data is made easily and freely available. New insights in biology resulting from more researchers having access to data they otherwise would be unable to generate on their own are a boon for the field. The following chapter reviews several cancer genomics-related databases and outlines the type of data they contain, as well as the methods required to access each database. While this list is not comprehensive, it should provide a basis for cancer researchers to begin exploring some of the many large datasets that are available to them.

Key words

Cancer resources Genomics databases Cancer ontology Cancer genomics 


  1. 1.
    Cochrane G, Karsch-Mizrachi I, Takagi T (2016) The international nucleotide sequence database collaboration. Nucleic Acids Res 44:D48–D50CrossRefGoogle Scholar
  2. 2.
    Zutter M, Bloom K, Cheng L, Hagemann I, Kaufman J, Krasinskas A, Lazar A, Leonard D, Lindeman N, Moyer A (2015) The cancer genomics resource list. Arch Pathol Lab Med 139:989–1008CrossRefGoogle Scholar
  3. 3.
    Yang Y, Dong X, Xie B, Ding N, Chen J, Li Y, Zhang Q, Qu H, Fang X (2015) Databases and web tools for cancer genomics study. Genomics Proteomics Bioinformatics 13:46–50CrossRefPubMedCentralGoogle Scholar
  4. 4.
    Simonyan V, Mazumder R (2014) High-performance integrated virtual environment (HIVE) tools and applications for big data analysis. Genes 5:957–981CrossRefPubMedCentralGoogle Scholar
  5. 5.
    Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:R86CrossRefPubMedCentralGoogle Scholar
  6. 6.
    Blankenberg D, Kuster GV, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Chapter 19:Biol 19.10.1–19.10.21Google Scholar
  7. 7.
    Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15:1451–1455CrossRefPubMedCentralGoogle Scholar
  8. 8.
    The Cancer Genome Atlas.
  9. 9.
    Wilks C, Cline MS, Weiler E, Diehkans M, Craft B, Martin C, Murphy D, Pierce H, Black J, Nelson D et al (2014) The cancer genomics hub (CGHub): overcoming cancer through the power of torrential data. Database (Oxford) 2014:bau093CrossRefGoogle Scholar
  10. 10.
    Therapeutically applicable research to generate effective treatments.
  11. 11.
    Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D et al (2012) The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483:603–607CrossRefPubMedCentralGoogle Scholar
  12. 12.
    ICGC Cancer Genome Projects.
  13. 13.
    Volders PJ, Verheggen K, Menschaert G, Vandepoele K, Martens L, Vandesompele J, Mestdagh P (2015) An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res 43:D174–D180CrossRefGoogle Scholar
  14. 14.
    Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S et al (2015) COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 43:D805–D811CrossRefGoogle Scholar
  15. 15.
    Cancer Cell Line Encyclopedia (CCLE).
  16. 16.
    Wu T-J (2014) Integration of cancer-related mutations for pan-cancer analysis. The George Washington University, Washington, DCGoogle Scholar
  17. 17.
    Wu T-J, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R (2014) A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE). Database 2014:bau022CrossRefPubMedCentralGoogle Scholar
  18. 18.
    Wu T-J, Schriml LM, Chen Q-R, Colbert M, Crichton DJ, Finney R, Hu Y, Kibbe WA, Kincaid H, Meerzaman D (2015) Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis. Database 2015:bav032CrossRefPubMedCentralGoogle Scholar
  19. 19.
    Dingerdissen HM, Torcivia-Rodriguez J, Hu Y, Chang T-C, Mazumder R, Kahsay R (2018) BioMuta and BioXpress: mutation and expression knowledge bases for cancer biomarker discovery. Nucleic Acids Res 46(D1):gkx907CrossRefGoogle Scholar
  20. 20.
    Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA (2012) Uberon, an integrative multi-species anatomy ontology. Genome Biol 13:R5CrossRefPubMedCentralGoogle Scholar
  21. 21.
    Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu T-J, Yan C, Zhang H, Mazumder R (2015) BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis. Database 2015:bav019CrossRefPubMedCentralGoogle Scholar
  22. 22.
    Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J et al (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44(D1):D862–D868CrossRefPubMedCentralGoogle Scholar
  23. 23.
    Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42:D980–D985CrossRefGoogle Scholar
  24. 24.
    Volders PJ, Helsens K, Wang X, Menten B, Martens L, Gevaert K, Vandesompele J, Mestdagh P (2013) LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res 41:D246–D251CrossRefPubMedCentralGoogle Scholar
  25. 25.
    Rubio-Perez C, Tamborero D, Schroeder MP, Antolin AA, Deu-Pons J, Perez-Llamas C, Mestres J, Gonzalez-Perez A, Lopez-Bigas N (2015) In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. Cancer Cell 27:382–396CrossRefPubMedCentralGoogle Scholar
  26. 26.
    Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Tamborero D, Schroeder MP, Jene-Sanz A, Santos A, Lopez-Bigas N (2013) IntOGen-mutations identifies cancer drivers across tumor types. Nat Methods 10:1081–1082CrossRefPubMedCentralGoogle Scholar
  27. 27.
    Lappalainen I, Almeida-King J, Kumanduri V, Senf A, Spalding JD, Saunders G, Kandasamy J, Caccamo M, Leinonen R, Vaughan B (2015) The European genome-phenome archive of human data consented for biomedical research. Nat Genet 47:692–695CrossRefPubMedCentralGoogle Scholar
  28. 28.
    Consortium U (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:gku989Google Scholar
  29. 29.
    Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E (2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6:pl1CrossRefPubMedCentralGoogle Scholar
  30. 30.
    Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E (2012) The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2:401–404CrossRefGoogle Scholar
  31. 31.
  32. 32.
  33. 33.
    Online Mendelian Inheritance in Man, OMIM®.
  34. 34.
    Schriml LM, Arze C, Nadendla S, Chang Y-WW, Mazaitis M, Felix V, Feng G, Kibbe WA (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40:D940–D946CrossRefGoogle Scholar
  35. 35.
    The ALCHEMIST Lung Cancer Trials.
  36. 36.
    Jänne PA, Oxnard G, Watson M, Gandara D, Ramalingam S, Vokes E, Mandrekar S, Hillman S, Watt C, Participating N. Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trial (ALCHEMIST)Google Scholar
  37. 37.
  38. 38.
    An Overview of NCI’s National Clinical Trials Network.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
    Anonymous (2015) The future of cancer genomics. Nat Med 21:99CrossRefGoogle Scholar
  43. 43.
    New licensing strategy with commercial partners will spur cancer database’s growth.
  44. 44.
  45. 45.
    FDA (2014) Public workshop: next generation sequencing standardsGoogle Scholar
  46. 46.
    Landi MT, Consonni D, Rotunno M, Bergen AW, Goldstein AM, Lubin JH, Goldin L, Alavanja M, Morgan G, Subar AF (2008) Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer. BMC Public Health 8:203CrossRefPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • John Torcivia-Rodriguez
    • 1
    Email author
  • Hayley Dingerdissen
    • 1
  • Ting-Chia Chang
    • 1
  • Raja Mazumder
    • 1
  1. 1.Department of Biochemistry and Molecular MedicineGeorge Washington UniversityWashington, DCUSA

Personalised recommendations