Error Correction in Methylation Profiling From NGS Bisulfite Protocols



Whole genome bisulfite sequencing (WGBS) has emerged as the primary technique for DNA methylation studies, because of its great potential in terms of speed, specificity, and the capability of addressing new biological implications as non-CpG context methylation or hemimethylation. However, despite the improvement that has meant the appearance of WGBS, processing and analyzing the resulting datasets is not as straightforward as in other methylation assays, and special care should be taken to obtain reliable results. As far as we know, an extensive review on the error sources that can bias methylation level measurement and the different algorithms that have been proposed to deal with it does not exist. Therefore, in this chapter all known WGBS error sources will be extensively reviewed and critically evaluated in order to suggest a couple of best practices to deal with all sources of bias in WGBS assays.


Whole-genome Bisulfite Sequencing (WGBS) Methylation Levels Methylation Contexts Original Methylation Status Bisulfite Conversion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bonasio, R., Tu, S., Reinberg, D.: Molecular signals of epigenetic states. Science 330(6004), 612–616 (2010)CrossRefGoogle Scholar
  2. 2.
    Lister, R., Ecker, J.R.: Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res. 19(6), 959–966 (2009)CrossRefGoogle Scholar
  3. 3.
    Jones, P.A.: Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13(7), 484–492 (2012)CrossRefGoogle Scholar
  4. 4.
    Hotchkiss, R.D.: The quantitative separation of purines, pyrimidines, and nucleosides by paper chromatography. J. Biol. Chem. 175(1), 315–332 (1948)Google Scholar
  5. 5.
    Riggs, A.D.: X inactivation, differentiation, and DNA methylation. Cytogenet. Cell Genet. 14(1), 9–25 (1975)CrossRefGoogle Scholar
  6. 6.
    Holliday, R., Pugh, J.E.: DNA modification mechanisms and gene activity during development. Science 187(4173), 226–232 (1975)CrossRefGoogle Scholar
  7. 7.
    Laird, P.W.: Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 11(3), 191–203 (2010)CrossRefGoogle Scholar
  8. 8.
    Frommer, M., McDonald, L.E., Millar, D.S., Collis, C.M., Watt, F., Grigg, G.W., Molloy, P.L., Paul, C.L.: A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl. Acad. Sci. U. S. A. 89(5), 1827–1831 (1992)CrossRefGoogle Scholar
  9. 9.
    Xi, Y., Li, W.: Bsmap: whole genome bisulfite sequence mapping program. BMC Bioinf. 10, 232 (2009)CrossRefGoogle Scholar
  10. 10.
    Chen, P.Y., Cokus, S.J., Pellegrini, M.: Bs seeker: precise mapping for bisulfite sequencing. BMC Bioinf. 11, 203 (2010)CrossRefGoogle Scholar
  11. 11.
    Guo, W., Fiziev, P., Yan, W., Cokus, S., Sun, X., Zhang, M.Q., Chen, P.Y., Pellegrini, M.: Bs-seeker2: a versatile aligning pipeline for bisulfite sequencing data. BMC Genomics 14, 774 (2013)CrossRefGoogle Scholar
  12. 12.
    Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F., Birol, I., Eichler, E.E., Sahinalp, S.C.: mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7(8), 576–577 (2010)Google Scholar
  13. 13.
    Krueger, F., Andrews, S.R.: Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27(11), 1571–1572 (2011)CrossRefGoogle Scholar
  14. 14.
    Pedersen, B., Hsieh, T.F., Ibarra, C., Fischer, R.L.: Methylcoder: software pipeline for bisulfite-treated sequences. Bioinformatics 27(17), 2435–2436 (2011)CrossRefGoogle Scholar
  15. 15.
    Hackenberg, M., Barturen, G., Oliver, J.L.: In: Tatarinova, T. (ed.) DNA Methylation Profiling from High-Throughput Sequencing Data, DNA Methylation - From Genomics to Technology, InTech (2012). doi:10.5772/34825
  16. 16.
    Chatterjee, A., Stockwell, P.A., Rodger, E.J., Morison, I.M.: Comparison of alignment software for genome-wide bisulphite sequence data. Nucleic Acids Res. 40(10), e79 (2012)CrossRefGoogle Scholar
  17. 17.
    Frith, M.C., Mori, R., Asai, K.: A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Res. 40(13), e100 (2012)CrossRefGoogle Scholar
  18. 18.
    Kunde-Ramamoorthy, G., Coarfa, C., Laritsky, E., Kessler, N.J., Harris, R.A., Xu, M., Chen, R., Shen, L., Milosavljevic, A., Waterland, R.A.: Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res. 42(6), e43 (2014)CrossRefGoogle Scholar
  19. 19.
    Schultz, M.D., Schmitz, R.J., Ecker, J.R.: ‘leveling’ the playing field for analyses of single-base resolution DNA methylomes. Trends Genet. 28(12), 583–585 (2012)Google Scholar
  20. 20.
    Beck, S., Rakyan, V.K.: The methylome: approaches for global DNA methylation profiling. Trends Genet. 24(5), 231–237 (2008)CrossRefGoogle Scholar
  21. 21.
    Krueger, F., Kreck, B., Franke, A., Andrews, S.R.: DNA methylome analysis using short bisulfite sequencing data. Nat. Methods 9(2), 145–151 (2012)CrossRefGoogle Scholar
  22. 22.
    Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D., Pradhan, S., Nelson, S.F., Pellegrini, M., Jacobsen, S.E.: Shotgun bisulphite sequencing of the arabidopsis genome reveals DNA methylation patterning. Nature 452(7184), 215–219 (2008)CrossRefGoogle Scholar
  23. 23.
    Meissner, A., Gnirke, A., Bell, G.W., Ramsahoye, B., Lander, E.S., Jaenisch, R.: Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 33(18), 5868–5877 (2005)CrossRefGoogle Scholar
  24. 24.
    Hansen, K.D., Langmead, B., Irizarry, R.A.: Bsmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 13(10), R83 (2012)CrossRefGoogle Scholar
  25. 25.
    Andrews, S.: FastQC: a quality control application for fastq data (2010). Available online at:
  26. 26.
    Hannon: Fastx-toolkit (2009)Google Scholar
  27. 27.
    Martin, M.: Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17(1), 10–12 (2011)CrossRefGoogle Scholar
  28. 28.
    Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15), 2114–2120 (2014)CrossRefGoogle Scholar
  29. 29.
    Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38(12), e131 (2010)CrossRefGoogle Scholar
  30. 30.
    Schwartz, S., Oren, R., Ast, G.: Detection and removal of biases in the analysis of next-generation sequencing reads. PLoS One 6(1), e16685 (2011)CrossRefGoogle Scholar
  31. 31.
    Poptsova, M.S., Il’icheva, I.A., Nechipurenko, D.Y., Panchenko, L.A., Khodikov, M.V., Oparina, N.Y., Polozov, R.V., Nechipurenko, Y.D., Grokhovsky, S.L.: Non-random DNA fragmentation in next-generation sequencing. Sci. Rep. 4, 4532 (2014)CrossRefGoogle Scholar
  32. 32.
    Aird, D., Ross, M.G., Chen, W.S., Danielsson, M., Fennell, T., Russ, C., Jaffe, D.B., Nusbaum, C., Gnirke, A.: Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12(2), R18 (2011)CrossRefGoogle Scholar
  33. 33.
    Benjamini, Y., Speed, T.P.: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40(10), e72 (2012)CrossRefGoogle Scholar
  34. 34.
    Miura, F., Enomoto, Y., Dairiki, R., Ito, T.: Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging. Nucleic Acids Res. 40(17), e136 (2012)CrossRefGoogle Scholar
  35. 35.
    Ziller, M.J., Hansen, K.D., Meissner, A., Aryee, M.J.: Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nat. Methods 12(3), 230–232 (2015)CrossRefGoogle Scholar
  36. 36.
    Kozarewa, I., Ning, Z., Quail, M.A., Sanders, M.J., Berriman, M., Turner, D.J.: Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (g+c)-biased genomes. Nat. Methods 6(4), 291–295 (2009)CrossRefGoogle Scholar
  37. 37.
    Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., Subgroup Genome Project Data Processing: The sequence alignment/map format and samtools. Bioinformatics 25(16), 2078–2079 (2009)Google Scholar
  38. 38.
    Broad-Institute: A set of tools for working with next generation sequencing data in the BAM. Available online at:
  39. 39.
    Barturen, G., Rueda, A., Oliver, J.L., Hackenberg, M.: MethylExtract: high-quality methylation maps and SNV calling from whole genome bisulfite sequencing data. F1000Res 2, 217 (2013)Google Scholar
  40. 40.
    Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L., Rice, P.M.: The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38(6), 1767–1771 (2010)CrossRefGoogle Scholar
  41. 41.
    James Kent, W., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at UCSC. Genome Res. 12(6), 996–1006 (2002)CrossRefGoogle Scholar
  42. 42.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)CrossRefGoogle Scholar
  43. 43.
    Li, H.: Improving SNP discovery by base alignment quality. Bioinformatics 27(8), 1157–1158 (2011)CrossRefGoogle Scholar
  44. 44.
    Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)CrossRefGoogle Scholar
  45. 45.
    Fuller, C.W., Middendorf, L.R., Benner, S.A., Church, G.M., Harris, T., Huang, X., Jovanovich, S.B., Nelson, J.R., Schloss, J.A., Schwartz, D.C., Vezenov, D.V.: The challenges of sequencing by synthesis. Nat. Biotechnol. 27(11), 1013–1023 (2009)CrossRefGoogle Scholar
  46. 46.
    Taub, M.A., Corrada Bravo, H., Irizarry, R.A.: Overcoming bias and systematic errors in next generation sequencing data. Genome Med. 2(12), 87 (2010)CrossRefGoogle Scholar
  47. 47.
    Del Fabbro, C., Scalabrin, S., Morgante, M., Giorgi, F.M.: An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One 8(12), e85024 (2013)CrossRefGoogle Scholar
  48. 48.
    Minoche, A.E., Dohm, J.C., Himmelbauer, H.: Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12(11), R112 (2011)CrossRefGoogle Scholar
  49. 49.
    Liu, Y., Siegmund, K.D., Laird, P.W., Berman, B.P.: Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biol. 13(7), R61 (2012)CrossRefGoogle Scholar
  50. 50.
    DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., Daly, M.J.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)CrossRefGoogle Scholar
  51. 51.
    Lister, R., Pelizzola, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery, J.R., Lee, L., Ye, Z., Ngo, Q.M., Edsall, L., Antosiewicz-Bourget, J., Stewart, R., Ruotti, V., Millar, A.H., Thomson, J.A., Ren, B., Ecker, J.R.: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462(7271), 315–322 (2009)CrossRefGoogle Scholar
  52. 52.
    Hon, G.C., Hawkins, R.D., Caballero, O.L., Lo, C., Lister, R., Pelizzola, M., Valsesia, A., Ye, Z., Kuan, S., Edsall, L.E., et al.: Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genet. Res. 22(2), 246–258 (2012)CrossRefGoogle Scholar
  53. 53.
    Ziller, M.J., Gu, H., Muller, F., Donaghey, J., Tsai, L.T., Kohlbacher, O., De Jager, P.L., Rosen, E.D., Bennett, D.A., Bernstein, B.E., Gnirke, A., Meissner, A.: Charting a dynamic DNA methylation landscape of the human genome. Nature 500(7463), 477–481 (2013)CrossRefGoogle Scholar
  54. 54.
    Lin, X., Sun, D., Rodriguez, B., Zhao, Q., Sun, H., Zhang, Y., Li, W.: Bseqc: quality control of bisulfite sequencing experiments. Bioinformatics 29(24), 3227–3229 (2013)CrossRefGoogle Scholar
  55. 55.
    Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin,K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29(1), 308–311 (2001)Google Scholar
  56. 56.
    Consortium Genomes Project, Abecasis, G.R., Altshuler, D., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., McVean, G.A.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010)Google Scholar
  57. 57.
    Weisenberger, D.J., Campan, M., Long, T.I., Kim, M., Woods, C., Fiala, E., Ehrlich, M., Laird, P.W.: Analysis of repetitive element DNA methylation by methylight. Nucleic Acids Res. 33(21), 6823–6836 (2005)CrossRefGoogle Scholar
  58. 58.
    McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297–1303 (2010)CrossRefGoogle Scholar
  59. 59.
    Koboldt, D.C., Chen, K., Wylie, T., Larson, D.E., McLellan, M.D., Mardis, E.R., Weinstock, G.M., Wilson, R.K., Ding, L.: Varscan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25(17), 2283–2285 (2009)CrossRefGoogle Scholar
  60. 60.
    Seisenberger, S., Andrews, S., Krueger, F., Arand, J., Walter, J., Santos, F., Popp, C., Thienpont, B., Dean, W., Reik, W.: The dynamics of genome-wide DNA methylation reprogramming in mouse primordial germ cells. Mol. Cell 48(6), 849–862 (2012)CrossRefGoogle Scholar
  61. 61.
    Iacobazzi, V., Castegna, A., Infantino, V., Andria, G.: Mitochondrial DNA methylation as a next-generation biomarker and diagnostic tool. Mol. Genet. Metab. 110(1–2), 25–34 (2013)CrossRefGoogle Scholar
  62. 62.
    Guo, J.U., Su, Y., Shin, J.H., Shin, J., Li, H., Xie, B., Zhong, C., Hu, S., Le, T., Fan, G., Zhu, H., Chang, Q., Gao, Y., Ming, G.L., Song, H.: Distribution, recognition and regulation of non-CpG methylation in the adult mammalian brain. Nat. Neurosci. 17(2), 215–222 (2014)CrossRefGoogle Scholar
  63. 63.
    Guo, W., Chung, W.Y., Qian, M., Pellegrini, M., Zhang, M.Q.: Characterizing the strand-specific distribution of non-CpG methylation in human pluripotent cells. Nucleic Acids Res. 42(5), 3009–3016 (2014)CrossRefGoogle Scholar
  64. 64.
    Stadler, M.B., Murr, R., Burger, L., Ivanek, R., Lienert, F., Scholer, A., van Nimwegen, E., Wirbelauer, C., Oakeley, E.J., Gaidatzis, D., Tiwari, V.K., Schubeler, D.: DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480(7378), 490–495 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Centro de Genómica e Investigaciones OncológicasPfizer-Universidad de Granada-Junta de AndalucíaGranadaSpain
  2. 2.Dpto. de Genética, Facultad de CienciasUniversidad de Granada, Campus de Fuentenueva s/nGranadaSpain

Personalised recommendations