Abstract
Modern life sciences are becoming increasingly data intensive, posing a significant challenge for most researchers and shifting the bottleneck of scientific discovery from data generation to data analysis. As a result, progress in genome research is increasingly impeded by bioinformatic hurdles. A new generation of powerful and easy-to-use genome analysis tools has been developed to address this issue, enabling biologists to perform complex bioinformatic analyses online - without having to learn a programming language or downloading and manually processing large datasets. In this tutorial paper, we describe the use of EpiGRAPH (http://epigraph.mpi-inf.mpg.de/) and Galaxy (http://galaxyproject.org/) for genome and epigenome analysis, and we illustrate how these two web services work together to identify epigenetic modifications that are characteristics of highly polymorphic (SNP-rich) promoters. This paper is supplemented with video tutorials (http://tinyurl.com/yc5xkqq), which provide a step-by-step guide through each example analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bernstein, B.E., Meissner, A. and Lander, E.S. (2007) The mammalian epigenome. Cell, 128, 669–681.
Chen, K. and Rajewsky, N. (2007) The evolution of gene regulation by transcription factors and microRNAs. Nat. Rev. Genet., 8, 93–103.
Zhang, M.Q. (2005) In: Pal, S. K. (ed.), PReMI. Springer-Verlag Berlin Heidelberg, Vol. 3776, pp. 31–38.
Frigola, J., Song, J., Stirzaker, C., Hinshelwood, R.A., Peinado, M.A. and Clark, S.J. (2006) Epigenetic remodeling in colorectal cancer results in coordinate gene suppression across an entire chromosome band. Nat. Genet., 38, 540–549.
Feinberg, A.P. (2007) Phenotypic plasticity and the epigenetics of human disease. Nature, 447, 433–440.
Eckhardt, F., Lewin, J., Cortese, R., Rakyan, V.K., Attwood, J., Burger, M., et al.(2006) DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet., 38, 1378–1385.
Williams, R.B., Chan, E.K., Cowley, M.J. and Little, P.F. (2007) The influence of genetic variation on gene expression. Genome Res., 17, 1707–1716.
Bock, C., Walter, J., Paulsen, M. and Lengauer, T. (2008) Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping. Nucleic Acids Res., 36, e55.
Schones, D.E. and Zhao, K. (2008) Genome-wide approaches to studying chromatin modifications. Nat. Rev. Genet., 9, 179–191.
Bock, C., Halachev, K., Buch, J. and Lengauer, T. (2009) EpiGRAPH: User-friendly software for statistical analysis and prediction of (epi-) genomic data. Genome Biol., 10, R14.
Bock, C., Paulsen, M., Tierling, S., Mikeska, T., Lengauer, T. and Walter, J. (2006) CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure. PLoS Genet., 2, e26.
Liu, F., Tostesen, E., Sundet, J.K., Jenssen, T.K., Bock, C., Jerstad, G.I., et al.(2007) The human genomic melting map. PLoS Comput. Biol., 3, e93.
Bock, C., Walter, J., Paulsen, M. and Lengauer, T. (2007) CpG island mapping by epigenome prediction. PLoS Comput. Biol., 3, e110.
Moser, D., Ekawardhani, S., Kumsta, R., Palmason, H., Bock, C., Athanassiadou, Z., et al.(2008) Functional analysis of a potassium-chloride co-transporter 3 (SLC12A6) promoter polymorphism leading to an additional DNA methylation site. Neuropsychopharmacology, 34, 458–467.
Blankenberg, D., Taylor, J., Schenck, I., He, J., Zhang, Y., Ghent, M., et al.(2007) A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res., 17, 960–964.
Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., et al.(2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res., 15, 1451–1455.
Pond, S.L., Frost, S.D. and Muse, S.V. (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics, 21, 676–679.
Rice, P., Longden, I. and Bleasby, A. (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet., 16, 276–277.
van Steensel, B. (2005) Mapping of genetic and epigenetic regulatory networks using microarrays. Nat. Genet., 37 Suppl, S18–24.
Bock, C. and Lengauer, T. (2008) Computational epigenetics. Bioinformatics, 24, 1–10.
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., et al.(2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol., 5, R80.
Liu, X.S. (2007) Getting started in tiling microarray analysis. PLoS Comput. Biol., 3, 1842–1844.
Johnson, D.S., Li, W., Gordon, D.B., Bhattacharjee, A., Curry, B., Ghosh, J., et al.(2008) Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res., 18, 393–403.
Johnson, W.E., Li, W., Meyer, C.A., Gottardo, R., Carroll, J.S., Brown, M. and Liu, X.S. (2006) Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA., 103, 12457–12462.
Kumaki, Y., M. Oda, and M. Okano. 2008. QUMA: quantification tool for methylation analysis. Nucleic Acids Res36: W170–175.
Bock, C., Reither, S., Mikeska, T., Paulsen, M., Walter, J. and Lengauer, T. (2005) BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing. Bioinformatics, 21, 4067–4068.
Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., et al.(2008) The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res., 36, D773–779.
Flicek, P., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., et al.(2008) Ensembl 2008. Nucleic Acids Res., 36, D707–714.
Das, R., Dimitrova, N., Xuan, Z., Rollins, R.A., Haghighi, F., Edwards, J.R., et al.(2006) Computational prediction of methylation status in human genomic sequences. Proc. Natl. Acad. Sci. U. S. A., 103, 10713–10716.
Fang, F., Fan, S., Zhang, X. and Zhang, M.Q. (2006) Predicting methylation status of CpG islands in the human brain. Bioinformatics, 22, 2204–2209.
Yamada, Y., Watanabe, H., Miura, F., Soejima, H., Uchiyama, M., Iwasaka, T., et al.(2004) A comprehensive analysis of allelic methylation status of CpG islands on human chromosome 21q. Genome Res., 14, 247–266.
Noble, W.S. (2006) What is a support vector machine? Nat. Biotechnol., 24, 1565–1567.
Zhang, Y., C. Rohde, S. Tierling, T.P. Jurkowski, C. Bock, D. Santacruz, S. Ragozin, R. Reinhardt, M. Groth, J. Walter, and A. Jeltsch. 2009. DNA methylation analysis of chromosome 21 gene promoters at single base pair and single allele resolution. PLoS Genet 5: e1000438.
Frazer, K.A., Ballinger, D.G., Cox, D.R., Hinds, D.A., Stuve, L.L., Gibbs, R.A., et al.(2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861.
ENCODE Project Consortium. (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science, 306, 636–640.
Wang, G.P., Ciuffi, A., Leipzig, J., Berry, C.C. and Bushman, F.D. (2007) HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. Genome Res., 17, 1186–1194.
Witten, I.H. and Frank, E. (2000) Data mining : practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, Calif.
Hastie, T., Tibshirani, R. and Friedman, J.H. (2001) The elements of statistical learning : data mining, inference, and prediction. Springer, New York.
Tarca, A.L., Carey, V.J., Chen, X.W., Romero, R. and Draghici, S. (2007) Machine learning and its applications to biology. PLoS Comput. Biol., 3, e116.
Meissner, A., Mikkelsen, T.S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A., et al.(2008) Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature, 454, 766–770.
Acknowledgments
We would like to thank Joachim Büch for maintaining the IT infrastructure of EpiGRAPH, Yoichi Yamada and Sascha Tierling for providing DNA methylation data, and Martina Paulsen as well as Jörn Walter for helpful discussions. EpiGRAPH is partially funded by the European Union through the CANCERDIP project (HEALTH-F2–2007-200620; http://www.cancerdip.eu/). Galaxy is supported by NSF Grant DBI-0543285 and NIH Grant 5R01HG003646–02 as well as by funds from the Huck Institutes for Life Sciences at Penn State University and Pennsylvania Department of Health.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Bock, C., Von Kuster, G., Halachev, K., Taylor, J., Nekrutenko, A., Lengauer, T. (2010). Web-Based Analysis of (Epi-) Genome Data Using EpiGRAPH and Galaxy. In: Barnes, M., Breen, G. (eds) Genetic Variation. Methods in Molecular Biology, vol 628. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-367-1_15
Download citation
DOI: https://doi.org/10.1007/978-1-60327-367-1_15
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60327-366-4
Online ISBN: 978-1-60327-367-1
eBook Packages: Springer Protocols