Base-Calling for Bioinformaticians



High-throughput platforms execute billions of simultaneous sequencing reactions. Base-calling is the process of decoding the output signals of these reactions into sequence reads. In this chapter, we detail the facets of base-calling using the perspective of signal communication. We primarily focus on the Illumina high-throughput sequencing platform and review different third-party base-calling implementations.


Support Vector Machine Illumina Sequencing Residual Signal Illumina Platform Distortion Factor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors would like to thank Fabian Menges, Giuseppe Narzisi, and Bud Mishra for sharing their early TotalReCaller results, for Dan Valente for formulating the unified distortion model, and for Dina Esposito for useful comments on the chapter. Yaniv Erlich is an Andria and Paul Heafy family fellow.


  1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR et al. 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218): 53–59.PubMedCrossRefGoogle Scholar
  2. Erlich Y, Mitra PP, delaBastide M, McCombie WR, Hannon GJ. 2008. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods 5(8): 679–682.Google Scholar
  3. Ewing B, Green P. 1998. Base-calling of automated sequencer traces using Phred II error ­probabilities. Genome Res 8(3): 186–194.Google Scholar
  4. Ewing B, Hillier L, Wendl MC, Green P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8(3): 175–185.Google Scholar
  5. Kailath T, Poor HV. 1998. Detection of stochastic processes. IEEE T. Inform Theory 44(6): 2230–2259.CrossRefGoogle Scholar
  6. Kao WC, Song YS. 2011. naiveBayesCall: An Efficient Model-Based Base-Calling Algorithm for High-Throughput Sequencing. J Comput Biol 18(3): 365–377.PubMedCrossRefGoogle Scholar
  7. Kao WC, Stevens K, Song YS. 2009. BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing. Genome Res 19(10): 1884–1895.PubMedCrossRefGoogle Scholar
  8. Kircher M, Stenzel U, Kelso J. 2009. Improved base-calling for the Illumina Genome Analyzer using machine learning strategies. Genome Biol 10(8): R83.PubMedCrossRefGoogle Scholar
  9. Kriseman J, Busick C, Szelinger S, Dinu V. 2010. BING: biomedical informatics pipeline for Next Generation Sequencing. J Biomed Inform 43(3): 428–434.PubMedCrossRefGoogle Scholar
  10. Ledergerber C, Dessimoz C. 2011. Base-calling for next-generation sequencing platforms. Brief Bioinform.Google Scholar
  11. Li L, Speed TP. 1999. An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequencing. Electrophoresis 20(7): 1433–1442.PubMedCrossRefGoogle Scholar
  12. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO et al. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950): 289–293.PubMedCrossRefGoogle Scholar
  13. Lister R, Pelizzola M, Kida YS, Hawkins RD, Nery JR, Hon G, Antosiewicz-Bourget J, O’Malley R, Castanon R, Klugman S et al. 2011. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471(7336): 68–73.PubMedCrossRefGoogle Scholar
  14. Metzker ML. 2010. Sequencing technologies – the next generation. Nat Rev Genet 11(1): 31–46.PubMedCrossRefGoogle Scholar
  15. Quinlan AR, Stewart DA, Stromberg MP, Marth GT. 2008. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Methods 5(2): 179–181.PubMedCrossRefGoogle Scholar
  16. Romiguier J, Ranwez V, Douzery EJ, Galtier N. 2010. Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. Genome Res 20(8): 1001–1009.PubMedCrossRefGoogle Scholar
  17. Rougemont J, Amzallag A, Iseli C, Farinelli L, Xenarios I, Naef F. 2008. Probabilistic base-calling of Solexa sequencing data. BMC Bioinformatics 9: 431.PubMedCrossRefGoogle Scholar
  18. Shenoi BA. 2006. Introduction to digital signal processing and filter design. Wiley ; John Wiley [distributor], Hoboken, NJ.Google Scholar
  19. Sklar LA. 2005. Flow cytometry for biotechnology. Oxford University Press, New York.Google Scholar
  20. Wang Z, Gerstein M, Snyder M. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1): 57–63.PubMedCrossRefGoogle Scholar
  21. Whiteford N, Skelly T, Curtis C, Ritchie ME, Lohr A, Zaranek AW, Abnizova I, Brown C. 2009. Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics 25(17): 2194–2199.PubMedCrossRefGoogle Scholar
  22. Wu X, Ding L, Li Z, Zhang Y, Liu X, Wang L. 2010. Determination of the migration of bisphenol diglycidyl ethers from food contact materials by high performance chromatography-tandem mass spectrometry coupled with multi-walled carbon nanotubes solid phase extraction. Se Pu 28(11): 1094–1098.PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Whitehead Institute for Biomedical ResearchCambridgeUSA

Personalised recommendations