Abstract
Genome science has progressed greatly in recent years and its potential applications caused scientists to believe that biology will be the foremost science of the twenty-first century. The outcome of genome research projects has a major impact on the life sciences. Being able to gain genome sequences can be helpful for other analyses, such as the detection of single nucleotide polymorphisms (SNPs) and comparative genomic research. There are many potential applications for genome research, including molecular medicine, risk assessment, bioarchaeology, anthropology, evolution and human migration, DNA forensics, and agriculture, livestock breeding, and bioprocessing. These applications will not be successfully processed until we are able to sequence genomes within a reasonable amount of time and cost. Personalized medicine is a promising area that is defined as being based on having individual genomes at hand; it is also a big potential market if we look at sequence assembly from this perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Scheibye-Alsing, K., et al. (2009). Sequence assembly. Computational Biology and Chemistry, 33(2), 121–136.
Li, H., Ruan, J., & Durbin, R. (2008). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18(11), 1851–1858.
Li, R., et al. (2008). SOAP: Short oligonucleotide alignment program. Bioinformatics, 24(5), 713–714.
Rumble, S. M., et al. (2009). SHRiMP: Accurate mapping of short color-space reads. PLoS Computational Biology, 5(5), e1000386.
Butler, J., et al. (2008). ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Research, 18(5), 810–820.
Ariyaratne, P. N., & Sung, W. K. (2011). PE-Assembler: De novo assembler using short paired-end reads. Bioinformatics, 27(2), 167–174.
Huang, X., & Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9(9), 868–877.
Batzoglou, S., et al. (2002). ARACHNE: A whole-genome shotgun assembler. Genome Research, 12(1), 177–189.
Simpson, J. T., et al. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117–1123.
Miller, J. R., Koren, S., & Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics, 95(6), 315.
Kececioglu, J., & Ju, J. (2001). Separating repeats in DNA sequence assembly. Proceedings of the Fifth Annual International Conference on Computational Biology, ACM.
Paszkiewicz, K., & Studholme, D. J. (2010). De novo assembly of short sequence reads. Briefings in Bioinformatics, 11(5), 457–472.
Shi, H., et al. (2010). Quality-score guided error correction for short-read sequencing data using CUDA. Procedia Computer Science, 1(1), 1129–1138.
Cock, P. J. A., et al. (2010). The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 38(6), 1767–1771.
Warren, R. L., et al. (2007). Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23(4), 500–501.
Dohm, J. C., et al. (2007). SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Research, 17(11), 1697–1706.
Smit, A., Hubley, R., & Green, P. (2004). RepeatMasker Open-3.0 1996–2004. Institute for Systems Biology, Seattle.
Chaisson, M. J., Brinza, D., & Pevzner, P. A. (2009). De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research, 19(2), 336–346.
Pevzner, P. A., Tang, H., & Waterman, M. S. (2001). An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences, 98(17), 9748–9753.
Chaisson, M. J., & Pevzner, P. A. (2008). Short read fragment assembly of bacterial genomes. Genome Research, 18(2), 324–330.
Li, R., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20(2), 265–272.
Tammi, M. T., et al. (2003). Correcting errors in shotgun sequences. Nucleic Acids Research, 31(15), 4663–4672.
Salmela, L. (2010). Correction of sequencing errors in a mixed set of reads. Bioinformatics, 26(10), 1284–1290.
Schröder, J., et al. (2009). SHREC: A short-read error correction method. Bioinformatics, 25(17), 2157–2163.
Quackenbush, J., et al. (2000). The TIGR gene indices: Reconstruction and representation of expressed gene sequences. Nucleic Acids Research, 28(1), 141–145.
Gilchrist, M. J., et al. (2004) Defining a large set of full-length clones from a < i > Xenopus tropicalis </i > EST project. Developmental Biology, 271(2), 498–516.
Pertea, G., et al. (2003). TIGR Gene Indices clustering tools (TGICL): A software system for fast clustering of large EST datasets. Bioinformatics, 19(5), 651–652.
Church, D. M., et al. (2009). Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biology, 7(5), e1000112.
Salzberg, S. L., & Yorke, J. A. (2005). Beware of mis-assembled genomes. Bioinformatics, 21(24), 4320–4321.
Choi, J. H., et al. (2008). A machine-learning approach to combined evidence validation of genome assemblies. Bioinformatics, 24(6), 744–750.
Phillippy, A. M., Schatz, M. C., & Pop, M. (2008). Genome assembly forensics: Finding the elusive mis-assembly. Genome Biology, 9(3), R55.
Meader, S., et al. (2010). Genome assembly quality: Assessment and improvement using the neutral indel model. Genome Research, 20(5), 675–684.
Myers, E. W., et al. (2000). A whole-genome assembly of Drosophila. Science, 287(5461), 2196–2204.
Huang, X., et al. (2003). PCAP: A whole-genome assembly program. Genome Research, 13(9), 2164–2170.
Zerbino, D. R., & Birney, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18(5), 821–829.
Haiminen, N., et al. (2011). Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results. PLoS One, 6(9), e24182.
Namiki, T., et al. (2011). MetaVelvet: An extension of velvet assembler to de novo metagenome assembly from short sequence reads. Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, ACM.
Ewing, B., & Green, P. (1998). Base-calling of automated sequencer traces using Phred. II. error probabilities. Genome Research, 8(3), 186–194.
Narzisi, G., & Mishra, B. (2011). Comparing de novo genome assembly: The long and short of it. PLoS One, 6(4), e19175.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Masoudi-Nejad, A., Narimani, Z., Hosseinkhan, N. (2013). The Assembly of Sequencing Data. In: Next Generation Sequencing and Sequence Assembly. SpringerBriefs in Systems Biology, vol 4. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7726-6_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7726-6_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7725-9
Online ISBN: 978-1-4614-7726-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)