Abstract
The use of large datasets has become ubiquitous in biomedical sciences. Researchers in the field of cancer genomics have, in recent years, generated large volumes of data from their experiments. Those responsible for production of this data often analyze a narrow subset of this data based on the research question they are trying to address: this is the case whether or not they are acting independently or in conjunction with a large-scale cancer genomics project. The reality of this situation creates the opportunity for other researchers to repurpose this data for different hypotheses if the data is made easily and freely available. New insights in biology resulting from more researchers having access to data they otherwise would be unable to generate on their own are a boon for the field. The following chapter reviews several cancer genomics-related databases and outlines the type of data they contain, as well as the methods required to access each database. While this list is not comprehensive, it should provide a basis for cancer researchers to begin exploring some of the many large datasets that are available to them.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cochrane G, Karsch-Mizrachi I, Takagi T (2016) The international nucleotide sequence database collaboration. Nucleic Acids Res 44:D48–D50
Zutter M, Bloom K, Cheng L, Hagemann I, Kaufman J, Krasinskas A, Lazar A, Leonard D, Lindeman N, Moyer A (2015) The cancer genomics resource list. Arch Pathol Lab Med 139:989–1008
Yang Y, Dong X, Xie B, Ding N, Chen J, Li Y, Zhang Q, Qu H, Fang X (2015) Databases and web tools for cancer genomics study. Genomics Proteomics Bioinformatics 13:46–50
Simonyan V, Mazumder R (2014) High-performance integrated virtual environment (HIVE) tools and applications for big data analysis. Genes 5:957–981
Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11:R86
Blankenberg D, Kuster GV, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Chapter 19:Biol 19.10.1–19.10.21
Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15:1451–1455
The Cancer Genome Atlas. http://cancergenome.nih.gov
Wilks C, Cline MS, Weiler E, Diehkans M, Craft B, Martin C, Murphy D, Pierce H, Black J, Nelson D et al (2014) The cancer genomics hub (CGHub): overcoming cancer through the power of torrential data. Database (Oxford) 2014:bau093
Therapeutically applicable research to generate effective treatments. https://ocg.cancer.gov/programs/target
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D et al (2012) The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483:603–607
ICGC Cancer Genome Projects. https://icgc.org/icgc
Volders PJ, Verheggen K, Menschaert G, Vandepoele K, Martens L, Vandesompele J, Mestdagh P (2015) An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res 43:D174–D180
Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S et al (2015) COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 43:D805–D811
Cancer Cell Line Encyclopedia (CCLE). https://www.broadinstitute.org/software/cprg/?q=node/11
Wu T-J (2014) Integration of cancer-related mutations for pan-cancer analysis. The George Washington University, Washington, DC
Wu T-J, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R (2014) A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE). Database 2014:bau022
Wu T-J, Schriml LM, Chen Q-R, Colbert M, Crichton DJ, Finney R, Hu Y, Kibbe WA, Kincaid H, Meerzaman D (2015) Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis. Database 2015:bav032
Dingerdissen HM, Torcivia-Rodriguez J, Hu Y, Chang T-C, Mazumder R, Kahsay R (2018) BioMuta and BioXpress: mutation and expression knowledge bases for cancer biomarker discovery. Nucleic Acids Res 46(D1):gkx907
Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA (2012) Uberon, an integrative multi-species anatomy ontology. Genome Biol 13:R5
Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu T-J, Yan C, Zhang H, Mazumder R (2015) BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis. Database 2015:bav019
Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J et al (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44(D1):D862–D868
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42:D980–D985
Volders PJ, Helsens K, Wang X, Menten B, Martens L, Gevaert K, Vandesompele J, Mestdagh P (2013) LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res 41:D246–D251
Rubio-Perez C, Tamborero D, Schroeder MP, Antolin AA, Deu-Pons J, Perez-Llamas C, Mestres J, Gonzalez-Perez A, Lopez-Bigas N (2015) In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. Cancer Cell 27:382–396
Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Tamborero D, Schroeder MP, Jene-Sanz A, Santos A, Lopez-Bigas N (2013) IntOGen-mutations identifies cancer drivers across tumor types. Nat Methods 10:1081–1082
Lappalainen I, Almeida-King J, Kumanduri V, Senf A, Spalding JD, Saunders G, Kandasamy J, Caccamo M, Leinonen R, Vaughan B (2015) The European genome-phenome archive of human data consented for biomedical research. Nat Genet 47:692–695
Consortium U (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:gku989
Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E (2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6:pl1
Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E (2012) The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2:401–404
NCI Thesaurus. https://ncit.nci.nih.gov/ncitbrowser/
Terminology Resources. http://www.cancer.gov/research/resources/terminology
Online Mendelian Inheritance in Man, OMIM®. http://omim.org/
Schriml LM, Arze C, Nadendla S, Chang Y-WW, Mazaitis M, Felix V, Feng G, Kibbe WA (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40:D940–D946
The ALCHEMIST Lung Cancer Trials. http://www.cancer.gov/types/lung/research/alchemist
Jänne PA, Oxnard G, Watson M, Gandara D, Ramalingam S, Vokes E, Mandrekar S, Hillman S, Watt C, Participating N. Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trial (ALCHEMIST)
The “Exceptional Responders” Study. http://dctd.cancer.gov/MajorInitiatives/NCI-sponsored_trials_in_precision_medicine.htm#h06
An Overview of NCI’s National Clinical Trials Network. http://ctep.cancer.gov/initiativesPrograms/nctn.htm
NCI News Note. http://www.cancer.gov/news-events/press-releases/2014/GenomicDataCommonsNewsNote
NCI Cancer Genomics Cloud Pilots. https://cbiit.nci.nih.gov/ncip/nci-cancer-genomics-cloud-pilots
CCG Programs. http://www.cancer.gov/about-nci/organization/ccg/programs
Anonymous (2015) The future of cancer genomics. Nat Med 21:99
New licensing strategy with commercial partners will spur cancer database’s growth. http://cancer.sanger.ac.uk/cosmic/license
DREAM Challenges. http://dreamchallenges.org/project/closed/dream-7-nci-dream-drug-sensitivity-prediction-challenge/
FDA (2014) Public workshop: next generation sequencing standards
Landi MT, Consonni D, Rotunno M, Bergen AW, Goldstein AM, Lubin JH, Goldin L, Alavanja M, Morgan G, Subar AF (2008) Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer. BMC Public Health 8:203
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Torcivia-Rodriguez, J., Dingerdissen, H., Chang, TC., Mazumder, R. (2019). A Primer for Access to Repositories of Cancer-Related Genomic Big Data. In: Krasnitz, A. (eds) Cancer Bioinformatics. Methods in Molecular Biology, vol 1878. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8868-6_1
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8868-6_1
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8866-2
Online ISBN: 978-1-4939-8868-6
eBook Packages: Springer Protocols