Improving DNA Sequencing Accuracy and Throughput
• the underlying mechanics of electrophoresis is quite complex and still not completely understood;
• the yield of fragments of any given size can be quite small and variable;
• the mobility of fragments of a given size can depend on the terminating base;
• the data consists of samples from one or more continuous, non-stationary signals;
• boundaries between segments generated by distinct elements of the underlying sequence are ill-defined or nonexistent in the signal; and
• the sampling rate of the signal greatly exceeds the rate of evolution of the underlying discrete sequence.
Current approaches to base calling address only some of these issues, and usually in a heuristic, ad hoc way. In this article we describe some of our initial efforts towards increasing base calling accuracy and throughput by providing a rational, statistical foundation to the process of deducing sequence from signal.
KeywordsOriginal Signal Fredholm Integral Equation Base Calling Lawrence Livermore National Laboratory Reverse Complement
Unable to display preview. Download preview PDF.
- Applied Biosystems, 373 DNA sequencing analysis software user’s manual. May 1994, Part Number 903205, Rev. A.Google Scholar
- H. A. Drury, K. W. Clark, R. E. Hermes, et al., A graphical user interface for quantitative imaging and analysis of electrophoretic gels and autoradiograms, BioTechniques 12 (1992), no. 6, 892–901.Google Scholar
- J. C. Giddings, Dynamics of chromatography, Marcel Dekker, New York, 1965.Google Scholar
- J. B. Golden III, D. Torgersen, and C. Tibbetts, Pattern recognition for automated DNA sequencing I: on-line signal conditioning and feature extraction for basecalling, Proceedings of the First International Conference on Intelligent Systems for Molecular Biology (Menlo Park, CA) (L. Hunter, D. Searls, and J. Shavlik, eds.), AAAI Press, 1994, pp. 136–144.Google Scholar
- I. J. Good and M. L. Deaton, Recent advances in bump hunting (with discussion), Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface (William F. Eddy, ed.), vol. 13, Springer, New York, 1981, pp. 92–104.Google Scholar
- P. D. Grossman, S. Menchen, and D. Hershey, Quantitative analysis of DNA-sequencing electrophoresis, GATA 9 (1992), 9–16.Google Scholar
- B. F. Koop, L. Rowan, W-Q. Chen, et al., Sequence length and error analysis of Sequenase and automated Taq cycle sequencing methods, BioTechniques 14 (1993), no. 3, 442–447.Google Scholar
- L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 257-285 (1989), 77.Google Scholar
- C. K. Rushforth, Signal restoration, functional analysis, and fredholm integral equations of the first kind, Image Recovery: Theory and Application (Henry Stark, ed.), Academic Press, New York, 1987, pp. 1–27.Google Scholar
- V. Seshadri, The inverse gaussian distribution: A case study in exponential families, Oxford University Press, Oxford, 1993.Google Scholar
- B. W. Silverman, Using kernel density estimates to investigate multimodality, J. R. Statist. Soc. B 43 (1981), 97–99.Google Scholar
- C. Tibbetts, J. M. Bowling, and J. B. Golden, III, Neural networks for automated basecalling of gel-based DNA sequencing ladders, Automated DNA Sequencing and Analysis Techniques (J. C. Venter, ed.), Academic Press, New York 1994, pp. 219–229Google Scholar