Skip to main content

A Primer for Access to Repositories of Cancer-Related Genomic Big Data

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1878))

Abstract

The use of large datasets has become ubiquitous in biomedical sciences. Researchers in the field of cancer genomics have, in recent years, generated large volumes of data from their experiments. Those responsible for production of this data often analyze a narrow subset of this data based on the research question they are trying to address: this is the case whether or not they are acting independently or in conjunction with a large-scale cancer genomics project. The reality of this situation creates the opportunity for other researchers to repurpose this data for different hypotheses if the data is made easily and freely available. New insights in biology resulting from more researchers having access to data they otherwise would be unable to generate on their own are a boon for the field. The following chapter reviews several cancer genomics-related databases and outlines the type of data they contain, as well as the methods required to access each database. While this list is not comprehensive, it should provide a basis for cancer researchers to begin exploring some of the many large datasets that are available to them.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Cochrane G, Karsch-Mizrachi I, Takagi T (2016) The international nucleotide sequence database collaboration. Nucleic Acids Res 44:D48–D50

    Article  CAS  Google Scholar 

  2. Zutter M, Bloom K, Cheng L, Hagemann I, Kaufman J, Krasinskas A, Lazar A, Leonard D, Lindeman N, Moyer A (2015) The cancer genomics resource list. Arch Pathol Lab Med 139:989–1008

    Article  Google Scholar 

  3. Yang Y, Dong X, Xie B, Ding N, Chen J, Li Y, Zhang Q, Qu H, Fang X (2015) Databases and web tools for cancer genomics study. Genomics Proteomics Bioinformatics 13:46–50

    Article  PubMed Central  Google Scholar 

  4. Simonyan V, Mazumder R (2014) High-performance integrated virtual environment (HIVE) tools and applications for big data analysis. Genes 5:957–981

    Article  PubMed Central  Google Scholar 

  5. Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:R86

    Article  PubMed Central  Google Scholar 

  6. Blankenberg D, Kuster GV, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Chapter 19:Biol 19.10.1–19.10.21

    Google Scholar 

  7. Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15:1451–1455

    Article  CAS  PubMed Central  Google Scholar 

  8. The Cancer Genome Atlas. http://cancergenome.nih.gov

  9. Wilks C, Cline MS, Weiler E, Diehkans M, Craft B, Martin C, Murphy D, Pierce H, Black J, Nelson D et al (2014) The cancer genomics hub (CGHub): overcoming cancer through the power of torrential data. Database (Oxford) 2014:bau093

    Article  Google Scholar 

  10. Therapeutically applicable research to generate effective treatments. https://ocg.cancer.gov/programs/target

  11. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D et al (2012) The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483:603–607

    Article  CAS  PubMed Central  Google Scholar 

  12. ICGC Cancer Genome Projects. https://icgc.org/icgc

  13. Volders PJ, Verheggen K, Menschaert G, Vandepoele K, Martens L, Vandesompele J, Mestdagh P (2015) An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res 43:D174–D180

    Article  CAS  Google Scholar 

  14. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S et al (2015) COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 43:D805–D811

    Article  CAS  Google Scholar 

  15. Cancer Cell Line Encyclopedia (CCLE). https://www.broadinstitute.org/software/cprg/?q=node/11

  16. Wu T-J (2014) Integration of cancer-related mutations for pan-cancer analysis. The George Washington University, Washington, DC

    Google Scholar 

  17. Wu T-J, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R (2014) A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE). Database 2014:bau022

    Article  PubMed Central  Google Scholar 

  18. Wu T-J, Schriml LM, Chen Q-R, Colbert M, Crichton DJ, Finney R, Hu Y, Kibbe WA, Kincaid H, Meerzaman D (2015) Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis. Database 2015:bav032

    Article  PubMed Central  Google Scholar 

  19. Dingerdissen HM, Torcivia-Rodriguez J, Hu Y, Chang T-C, Mazumder R, Kahsay R (2018) BioMuta and BioXpress: mutation and expression knowledge bases for cancer biomarker discovery. Nucleic Acids Res 46(D1):gkx907

    Article  Google Scholar 

  20. Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA (2012) Uberon, an integrative multi-species anatomy ontology. Genome Biol 13:R5

    Article  PubMed Central  Google Scholar 

  21. Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu T-J, Yan C, Zhang H, Mazumder R (2015) BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis. Database 2015:bav019

    Article  PubMed Central  Google Scholar 

  22. Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J et al (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44(D1):D862–D868

    Article  CAS  PubMed Central  Google Scholar 

  23. Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42:D980–D985

    Article  CAS  Google Scholar 

  24. Volders PJ, Helsens K, Wang X, Menten B, Martens L, Gevaert K, Vandesompele J, Mestdagh P (2013) LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res 41:D246–D251

    Article  CAS  PubMed Central  Google Scholar 

  25. Rubio-Perez C, Tamborero D, Schroeder MP, Antolin AA, Deu-Pons J, Perez-Llamas C, Mestres J, Gonzalez-Perez A, Lopez-Bigas N (2015) In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. Cancer Cell 27:382–396

    Article  CAS  PubMed Central  Google Scholar 

  26. Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Tamborero D, Schroeder MP, Jene-Sanz A, Santos A, Lopez-Bigas N (2013) IntOGen-mutations identifies cancer drivers across tumor types. Nat Methods 10:1081–1082

    Article  CAS  PubMed Central  Google Scholar 

  27. Lappalainen I, Almeida-King J, Kumanduri V, Senf A, Spalding JD, Saunders G, Kandasamy J, Caccamo M, Leinonen R, Vaughan B (2015) The European genome-phenome archive of human data consented for biomedical research. Nat Genet 47:692–695

    Article  CAS  PubMed Central  Google Scholar 

  28. Consortium U (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:gku989

    Google Scholar 

  29. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E (2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6:pl1

    Article  PubMed Central  Google Scholar 

  30. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E (2012) The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2:401–404

    Article  Google Scholar 

  31. NCI Thesaurus. https://ncit.nci.nih.gov/ncitbrowser/

  32. Terminology Resources. http://www.cancer.gov/research/resources/terminology

  33. Online Mendelian Inheritance in Man, OMIM®. http://omim.org/

  34. Schriml LM, Arze C, Nadendla S, Chang Y-WW, Mazaitis M, Felix V, Feng G, Kibbe WA (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40:D940–D946

    Article  CAS  Google Scholar 

  35. The ALCHEMIST Lung Cancer Trials. http://www.cancer.gov/types/lung/research/alchemist

  36. Jänne PA, Oxnard G, Watson M, Gandara D, Ramalingam S, Vokes E, Mandrekar S, Hillman S, Watt C, Participating N. Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trial (ALCHEMIST)

    Google Scholar 

  37. The “Exceptional Responders” Study. http://dctd.cancer.gov/MajorInitiatives/NCI-sponsored_trials_in_precision_medicine.htm#h06

  38. An Overview of NCI’s National Clinical Trials Network. http://ctep.cancer.gov/initiativesPrograms/nctn.htm

  39. NCI News Note. http://www.cancer.gov/news-events/press-releases/2014/GenomicDataCommonsNewsNote

  40. NCI Cancer Genomics Cloud Pilots. https://cbiit.nci.nih.gov/ncip/nci-cancer-genomics-cloud-pilots

  41. CCG Programs. http://www.cancer.gov/about-nci/organization/ccg/programs

  42. Anonymous (2015) The future of cancer genomics. Nat Med 21:99

    Article  Google Scholar 

  43. New licensing strategy with commercial partners will spur cancer database’s growth. http://cancer.sanger.ac.uk/cosmic/license

  44. DREAM Challenges. http://dreamchallenges.org/project/closed/dream-7-nci-dream-drug-sensitivity-prediction-challenge/

  45. FDA (2014) Public workshop: next generation sequencing standards

    Google Scholar 

  46. Landi MT, Consonni D, Rotunno M, Bergen AW, Goldstein AM, Lubin JH, Goldin L, Alavanja M, Morgan G, Subar AF (2008) Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer. BMC Public Health 8:203

    Article  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Torcivia-Rodriguez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Torcivia-Rodriguez, J., Dingerdissen, H., Chang, TC., Mazumder, R. (2019). A Primer for Access to Repositories of Cancer-Related Genomic Big Data. In: Krasnitz, A. (eds) Cancer Bioinformatics. Methods in Molecular Biology, vol 1878. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8868-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8868-6_1

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8866-2

  • Online ISBN: 978-1-4939-8868-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics