BMC Bioinformatics

, 16:P3 | Cite as

Using large public data repositories to discover novel genetic mutations with prospective links to melanoma

  • Tamas S Gal
  • Sally R Ellingson
  • Chi Wang
  • Jinpeng Liu
  • Stuart G Jarrett
  • John A D'Orazio
Open Access
Poster presentation
  • 535 Downloads

Keywords

Melanoma Data Repository Ataxia Telangiectasia Next Generation Sequencing Data Basic Demographic Information 

Background

Next generation sequencing (NGS) data analysis pipelines are frequently described in literature. NGS data is relatively easy to acquire from national data repositories and most software used in the pipelines are open source. This study extends research on the causal relation between changes in the ataxia telangiectasia and Rad3 related (ATR) pathways and melanoma[1].

Materials and methods

To study the effects of mutations in the ATR region on melanoma, we downloaded the Melanoma Genome Sequencing Project dataset (dbGaP Study Accession: phs000452.v1.p1) from the dbGaP repository[2]. The dataset contained full exome sequencing data of paired normal and tumor samples of 122 phenotyped subjects in the format of trimmed and aligned BAM files. The dataset also included basic demographic information, such as gender and age; as well as disease specific variables, such as the localization of the melanoma and stage. The total size of the dataset was over 4TB, so we only downloaded the region of interest (ATR gene region) with 50Kbp padding before and after the ATR gene region. We used an available pipeline (Figure 1) for analysis that was previously developed for a lung cancer project by the Biostatistics and Bioinformatics Shared Resource Facility of the University of Kentucky Markey Cancer Center. The details of the data analysis pipeline will be published elsewhere. We used Python to automate data submission to the pipeline in combination with a configuration file that allowed us to easily swap different versions of the tools used in the pipeline and to match normal and tumor samples for the same patient (Figures 2, 3). Our experiments were executed on the Lipscomb High Performance Computing Cluster at the University of Kentucky.
Figure 1

Pipeline for dbGaP analysis.

Figure 2

Python code for automated submission.

Figure 3

Configuration file.

Results

Though analysis and validation of the results are still ongoing at the time of this report, we can share that we identified five previously unreported somatic missense or splice site SNP mutations in the ATR gene region in melanoma patients. Results will be further validated by analysis of NGS data from melanoma cell lines.

Conclusions

The main goal of this abstract was to describe a methodology that we used to identify novel genetic markers in publicly available data. This methodology offers a cost effective way to test hypotheses drawn from laboratory research on human genome data.

Notes

Acknowledgements

This research was supported by the Cancer Research Informatics and the Biostatistics and Bioinformatics Shared Resource Facilities of the University of Kentucky Markey Cancer Center (P30CA177558).

We would like to thank the University of Kentucky Information Technology department and Center for Computational Sciences for computing time on the Lipscomb High Performance Computing Cluster and for access to other supercomputing resources.

References

  1. 1.
    Jarrett SG, Horrell EM, Christian PA, Vanover JC, Boulanger MC, Zou Y, D'Orazio JA: PKA-mediated phosphorylation of ATR promotes recruitment of XPA to UV-induced DNA damage. Mol Cell. 2014, 54 (6): 999-1011. 10.1016/j.molcel.2014.05.030.PubMedCentralCrossRefPubMedGoogle Scholar
  2. 2.
    Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, Popova N, Pretel S, Ziyabari L, Lee M, Shao Y, Wang ZY, Sirotkin K, Ward M, Kholodov M, Zbicz K, Beck J, Kimelman M, Shevelev S, Preuss D, Yaschenko E, Graeff A, Ostell J, Sherry ST: The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007, 39 (10): 1181-6.PubMedCentralCrossRefPubMedGoogle Scholar

Copyright information

© Gal et al. 2015

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Authors and Affiliations

  • Tamas S Gal
    • 1
  • Sally R Ellingson
    • 1
  • Chi Wang
    • 1
  • Jinpeng Liu
    • 1
  • Stuart G Jarrett
    • 1
  • John A D'Orazio
    • 1
  1. 1.Markey Cancer CenterUniversity of KentuckyLexingtonUSA

Personalised recommendations