Skip to main content

Statistical Analysis of Electrophoresis Time Series for Improving Basecalling in DNA Sequencing

  • Conference paper
Advances in Mass Data Analysis of Signals and Images in Medicine, Biotechnology and Chemistry (MDA 2007)

Abstract

In automated DNA sequencing, the final algorithmic phase, referred to as basecalling, consists of the translation of four time signals in the form of peak sequences (electropherogram) to the corresponding sequence of bases. Commercial basecallers detect the peaks based on heuristics, and are very efficient when the peaks are distinct and regular in spread, amplitude and spacing. Unfortunately, in the practice the signals are subject to several degradations, among which peak superposition and peak merging are the most frequent. In these cases the experiment must be repeated and human intervention is required. Recently, there have been attempts to provide methodological foundations to the problem and to use statistical models for solving it. In this paper, we exploit a priori information and Bayesian estimation to remove degradations and recover the signals in an impulsive form which makes basecalling straightforward.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ABI. ABI Prism.: DNA sequencing analysis software, User’s Manual. Perkin Elmer Applied Biosystems, Foster City, CA (1996)

    Google Scholar 

  2. Amari, S., Cichocki, A.: Adaptive blind signal processing - neural network approaches. Proc. IEEE 86, 2026–2048 (1998)

    Article  Google Scholar 

  3. Boufounos, P., El-Difrawy, S., Ehrlich, D.: Basecalling using hidden Markov models. Journal of the Franklin Institute 341, 23–36 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  4. Comon, P.: Independent Component Analysis, a new concept? Signal Processing 36, 287–314 (1994)

    Article  MATH  Google Scholar 

  5. Ewing, B., Hillier, L., Wendl, M., Green, P.: Base-calling of automated sequencer traces using Phred. I, Accuracy assessment. Genome Res. 8, 175–185 (1998)

    Google Scholar 

  6. Ewing, B., Green, P.: Base-calling of automated sequencer traces using Phred. II, Error probabilities. Genome Res. 8, 186–194 (1998)

    Google Scholar 

  7. Freschi, V., Bugliolo, A.: Computer-aided DNA base calling from forward and riverse electropherograms. In: Priami, C., Merelli, E., Gonzalez, P., Omicini, A. (eds.) Transactions on Computational Systems Biology III. LNCS (LNBI), vol. 3737, pp. 1–13. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Haan, N.M., Godsill, S.J.: Modelling electropherogram data for DNA sequencing using variable dimension MCMC. In: Proc. Int. Conf. on Acoustics Speech and Signal Processing - ICASSP, Instanbul, Turkey, pp. 3542–3545 (2000)

    Google Scholar 

  9. Hyvärinen, A.: Fast and Robust Fixed-Point Algorithms for Independent Component Analysis. IEEE Trans. Neural Networks 10, 626–634 (1999)

    Article  Google Scholar 

  10. Li, L., Speed, T.P.: An estimate of the cross-talk matrix in four-dye fluorescence-based DNA sequencing. Electrophoresis 20, 1433–1442 (2000)

    Article  Google Scholar 

  11. Li, L.: DNA sequencing and parametric deconvolution. Statistica Sinica 12, 179–202 (2001)

    Google Scholar 

  12. Pereira, M., Andrade, L., El-Difrawy, S., Manolakos, E.S.: Statistical learning formulation of the DNA base-calling problem and its solution using a Bayesian EM framework. Discrete Appl. Math. 104, 229–258 (2000)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petra Perner Ovidio Salvetti

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tonazzini, A., Bedini, L. (2007). Statistical Analysis of Electrophoresis Time Series for Improving Basecalling in DNA Sequencing. In: Perner, P., Salvetti, O. (eds) Advances in Mass Data Analysis of Signals and Images in Medicine, Biotechnology and Chemistry. MDA 2007. Lecture Notes in Computer Science(), vol 4826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76300-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76300-0_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76299-7

  • Online ISBN: 978-3-540-76300-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics