Genetic Mapping and DNA Sequencing pp 183206  Cite as
Improving DNA Sequencing Accuracy and Throughput
Abstract

• the underlying mechanics of electrophoresis is quite complex and still not completely understood;

• the yield of fragments of any given size can be quite small and variable;

• the mobility of fragments of a given size can depend on the terminating base;

• the data consists of samples from one or more continuous, nonstationary signals;

• boundaries between segments generated by distinct elements of the underlying sequence are illdefined or nonexistent in the signal; and

• the sampling rate of the signal greatly exceeds the rate of evolution of the underlying discrete sequence.
Current approaches to base calling address only some of these issues, and usually in a heuristic, ad hoc way. In this article we describe some of our initial efforts towards increasing base calling accuracy and throughput by providing a rational, statistical foundation to the process of deducing sequence from signal.
Keywords
Porosity Migration Phosphorus Hydroxyl ElectrophoresisPreview
Unable to display preview. Download preview PDF.
References
 [1]Alan Agresti, Categorical data analysis, John Wiley and Sons, New York, 1990.MATHGoogle Scholar
 [2]Applied Biosystems, 373 DNA sequencing analysis software user’s manual. May 1994, Part Number 903205, Rev. A.Google Scholar
 [3]C. B. Begg and R. Gray, Calculation of polytomous logistic regression parameters using individualized regressions, Biometrika 71 (1984), 11–18.MathSciNetMATHCrossRefGoogle Scholar
 [4]A. Benveniste, M. Métivier, and P. Priouret, Adaptive algorithms and stochastic approximations, SpringerVerlag, Berlin, 1990.MATHCrossRefGoogle Scholar
 [5]N. Best, E. Arriaga, D. Y. Chen, and N. Dovichi, Separation of fragments up to 510 bases in length by use of 6% T noncrosslinked polyacrylamide for DNA sequencing in capillary electrophoresis, Anal. Chem. 66 (1994), 4063–4067.CrossRefGoogle Scholar
 [6]J. M. Bowling, K. L. Bruner, J. L. Cmarik, and C. Tibbetts, Neighboring nucleotide interactions during DNA sequencing gel electrophoresis Nucleic Acids Research 19 (1991), 3089–3097.CrossRefGoogle Scholar
 [7]Francis Collins and David Galas, Anew fiveyear plan for the U. S. Human Genome Project, Science 262 (1993), 43–46.CrossRefGoogle Scholar
 [8]H. A. Drury, K. W. Clark, R. E. Hermes, et al., A graphical user interface for quantitative imaging and analysis of electrophoretic gels and autoradiograms, BioTechniques 12 (1992), no. 6, 892–901.Google Scholar
 [9]R. J. Elliott, L. Aggoun, and J. B. Moore, Hidden markov models: Estimation and control, SpringerVerlag, New York, 1995.MATHGoogle Scholar
 [10]J. C. Giddings, Dynamics of chromatography, Marcel Dekker, New York, 1965.Google Scholar
 [11]M. C. Giddings, R. L. Brumley, M. Haker, and L. M. Smith, An adaptive, objectoriented strategy for base calling in DNA sequence analysis, Nucleic Acids Research 21 (1993), no. 19, 4530–4540.CrossRefGoogle Scholar
 [12]J. B. Golden III, D. Torgersen, and C. Tibbetts, Pattern recognition for automated DNA sequencing I: online signal conditioning and feature extraction for basecalling, Proceedings of the First International Conference on Intelligent Systems for Molecular Biology (Menlo Park, CA) (L. Hunter, D. Searls, and J. Shavlik, eds.), AAAI Press, 1994, pp. 136–144.Google Scholar
 [13]I. J. Good and M. L. Deaton, Recent advances in bump hunting (with discussion), Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface (William F. Eddy, ed.), vol. 13, Springer, New York, 1981, pp. 92–104.Google Scholar
 [14]P. D. Grossman, S. Menchen, and D. Hershey, Quantitative analysis of DNAsequencing electrophoresis, GATA 9 (1992), 9–16.Google Scholar
 [15]B. F. Koop, L. Rowan, WQ. Chen, et al., Sequence length and error analysis of Sequenase and automated Taq cycle sequencing methods, BioTechniques 14 (1993), no. 3, 442–447.Google Scholar
 [16]L. Landweber, An iteration formula for Fredholm integral equations of the first kind, Am. J. Math. 73 (1951), 615–624.MathSciNetMATHCrossRefGoogle Scholar
 [17]TaHsin Li, Blind identification and deconvolution of linear systems driven by binary random sequences, IEEE Trans. Inf. Theory 38 (1992), 26–38.MATHCrossRefGoogle Scholar
 [18]J. A. Lucky, T. B. Norris, and L. M. Smith, Analysis of resolution in DNA sequencing by capillary gel electrophoresis, J. Phys. Chem. 97 (1993), 3067–3075.CrossRefGoogle Scholar
 [19]P. McCullagh and J. A. Nelder, Generalized linear models, second ed., Chapman and Hall, London, 1989.MATHGoogle Scholar
 [20]L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 257285 (1989), 77.Google Scholar
 [21]C. K. Rushforth, Signal restoration, functional analysis, and fredholm integral equations of the first kind, Image Recovery: Theory and Application (Henry Stark, ed.), Academic Press, New York, 1987, pp. 1–27.Google Scholar
 [22]J. Z. Sanders, A. A. Petterson, P. J. Hughes, et al., Imaging as a tool for improving length and accuracy of sequence analysis in automated fluorescencebased DNA sequencing, Electrophoresis 12 (1991), 3–11.CrossRefGoogle Scholar
 [23]V. Seshadri, The inverse gaussian distribution: A case study in exponential families, Oxford University Press, Oxford, 1993.Google Scholar
 [24]E. O. Shaffer II and M. Olvera de la Cruz, Dynamics of gel electrophoresis, Macromolecules 22 (1989), 1351–1355.CrossRefGoogle Scholar
 [25]B. W. Silverman, Using kernel density estimates to investigate multimodality, J. R. Statist. Soc. B 43 (1981), 97–99.Google Scholar
 [26]G. W. Slater and G. Drouin, Why can we not sequence thousands of DNA bases on a polyacrylamide gel, Electrophoresis 13 (1992), 574–582.CrossRefGoogle Scholar
 [27]L. M. Smith, J. Z. Sanders, R. J. Kaiser, P. Hughes, C. Dodd, C. R. Connell, C. Heiner, S. B. H. Kent, and L. E. Hood, Fluorescence detection in automated DNA sequence analysis, Nature 321 (1986), 674–679.CrossRefGoogle Scholar
 [28]C. Tibbetts, J. M. Bowling, and J. B. Golden, III, Neural networks for automated basecalling of gelbased DNA sequencing ladders, Automated DNA Sequencing and Analysis Techniques (J. C. Venter, ed.), Academic Press, New York 1994, pp. 219–229Google Scholar
 [29]S. Twomey, On the numerical solution of Fredholm integral equations of the first kind by inversion of the linear system produced by quadrature., J. ACM 10 (1963), 97–101.MATHCrossRefGoogle Scholar
 [30]Y. Vardi and D. Lee, From image deblurring to optimal investments: maximum likelihood solutions for positive linear inverse problems, J. R. Statist. Soc. B 55 (1993), 569–612.MathSciNetMATHGoogle Scholar
 [31]E. Weinstein, M. Feder, and A. V. Oppenheim, Sequential algorithms for parameter estimation based on the KullbackLeibler information measure, IEEE Trans. Acoustics, Speech, and Signal Processing 38 (1990), 1652–1654.MATHCrossRefGoogle Scholar