Abstract
At this writing, public databases contain the completed, contiguous sequences of a large number of bacterial genomes (e.g., http://www.tigr.org/tdb/mdb/mdbcomplete.html) (1), the yeast Saccharomyces cerevisiae genome (2); and the genomes of the nematode (3), Drosophila (4), and Arabidopsis (5). Public sequencing projects for many other genomes, including several large ones, are in progress. In the case of the publicly funded human genome sequencing effort, more than 45% of the >3-Gb genome is in finished form, including several completed chromosomes, with the remaining 55% expected to be finished by the spring of 2003. The term finished to describe sequence has special meaning and significance in the public genome-sequencing arena, where there is a general agreement that a large-insert clone, chromosome, or genome is considered finished when the accuracy of the sequence exceeds 99.99% (i.e., <1 error in 10,000 bp) and all gaps that can be filled by known techniques are filled, so that the sequence is either completely contiguous or has very few, well-annotated gaps.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Tettelin, H., Nelson, K. E., Paulsen, I. T., et al. (2001) Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 293, 498–506.
Goffeau, A., Barrell, B. G., Bussey, H., et al. (1996) Life with 6000 genes. Science 274, 563–567.
The C. elegans Sequencing Consortium. (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018.
Adams, M. D., Celniker, S. E., Holt, R. A., et al. (2000) The genome sequence of Drosophila melanogaster. Science 287, 2185–2195.
The Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815.
Ewing, B., Hillier, L., Wendl, M., and Green, P. (1998) Base-calling of automated sequencer traces using Phred I. Accuracy assessment. Genome Res. 8, 175–185.
Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using Phred II. Error probabilities. Genome Res. 8, 186–194.
Gordon, D., Abajian, C, and Green, P. (1998) Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202.
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Schmutz, J., Grimwood, J., Myers, R.M. (2004). Assembly of DNA Sequencing Data. In: Zhao, S., Stodolsky, M. (eds) Bacterial Artificial Chromosomes. Methods in Molecular Biology™, vol 255. Humana Press. https://doi.org/10.1385/1-59259-752-1:319
Download citation
DOI: https://doi.org/10.1385/1-59259-752-1:319
Publisher Name: Humana Press
Print ISBN: 978-0-89603-988-9
Online ISBN: 978-1-59259-752-9
eBook Packages: Springer Protocols