Skip to main content

The Assembly of Sequencing Data

  • Chapter
  • First Online:
Next Generation Sequencing and Sequence Assembly

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS,volume 4))

  • 3177 Accesses

Abstract

Genome science has progressed greatly in recent years and its potential applications caused scientists to believe that biology will be the foremost science of the twenty-first century. The outcome of genome research projects has a major impact on the life sciences. Being able to gain genome sequences can be helpful for other analyses, such as the detection of single nucleotide polymorphisms (SNPs) and comparative genomic research. There are many potential applications for genome research, including molecular medicine, risk assessment, bioarchaeology, anthropology, evolution and human migration, DNA forensics, and agriculture, livestock breeding, and bioprocessing. These applications will not be successfully processed until we are able to sequence genomes within a reasonable amount of time and cost. Personalized medicine is a promising area that is defined as being based on having individual genomes at hand; it is also a big potential market if we look at sequence assembly from this perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    IlluminaMatePair: http://www.illumina.com/technology/mate_pair_sequencing_assay.ilmn

  2. 2.

    IlluminaPairedEnd: http://www.illumina.com/technology/paired_end_sequencing_assay.ilmn

  3. 3.

    http://www.illumina.com/technology/mate_pair_sequencing_assay.ilmn

References

  1. Scheibye-Alsing, K., et al. (2009). Sequence assembly. Computational Biology and Chemistry, 33(2), 121–136.

    Article  PubMed  CAS  Google Scholar 

  2. Li, H., Ruan, J., & Durbin, R. (2008). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18(11), 1851–1858.

    Article  PubMed  CAS  Google Scholar 

  3. Li, R., et al. (2008). SOAP: Short oligonucleotide alignment program. Bioinformatics, 24(5), 713–714.

    Article  PubMed  CAS  Google Scholar 

  4. Rumble, S. M., et al. (2009). SHRiMP: Accurate mapping of short color-space reads. PLoS Computational Biology, 5(5), e1000386.

    Article  PubMed  Google Scholar 

  5. Butler, J., et al. (2008). ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Research, 18(5), 810–820.

    Article  PubMed  CAS  Google Scholar 

  6. Ariyaratne, P. N., & Sung, W. K. (2011). PE-Assembler: De novo assembler using short paired-end reads. Bioinformatics, 27(2), 167–174.

    Article  PubMed  CAS  Google Scholar 

  7. Huang, X., & Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9(9), 868–877.

    Article  PubMed  CAS  Google Scholar 

  8. Batzoglou, S., et al. (2002). ARACHNE: A whole-genome shotgun assembler. Genome Research, 12(1), 177–189.

    Article  PubMed  CAS  Google Scholar 

  9. Simpson, J. T., et al. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117–1123.

    Article  PubMed  CAS  Google Scholar 

  10. Miller, J. R., Koren, S., & Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics, 95(6), 315.

    Article  PubMed  CAS  Google Scholar 

  11. Kececioglu, J., & Ju, J. (2001). Separating repeats in DNA sequence assembly. Proceedings of the Fifth Annual International Conference on Computational Biology, ACM.

    Google Scholar 

  12. Paszkiewicz, K., & Studholme, D. J. (2010). De novo assembly of short sequence reads. Briefings in Bioinformatics, 11(5), 457–472.

    Article  PubMed  CAS  Google Scholar 

  13. Shi, H., et al. (2010). Quality-score guided error correction for short-read sequencing data using CUDA. Procedia Computer Science, 1(1), 1129–1138.

    Article  Google Scholar 

  14. Cock, P. J. A., et al. (2010). The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 38(6), 1767–1771.

    Article  PubMed  CAS  Google Scholar 

  15. Warren, R. L., et al. (2007). Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23(4), 500–501.

    Article  PubMed  CAS  Google Scholar 

  16. Dohm, J. C., et al. (2007). SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Research, 17(11), 1697–1706.

    Article  PubMed  CAS  Google Scholar 

  17. Smit, A., Hubley, R., & Green, P. (2004). RepeatMasker Open-3.0 1996–2004. Institute for Systems Biology, Seattle.

    Google Scholar 

  18. Chaisson, M. J., Brinza, D., & Pevzner, P. A. (2009). De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research, 19(2), 336–346.

    Article  PubMed  CAS  Google Scholar 

  19. Pevzner, P. A., Tang, H., & Waterman, M. S. (2001). An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences, 98(17), 9748–9753.

    Article  CAS  Google Scholar 

  20. Chaisson, M. J., & Pevzner, P. A. (2008). Short read fragment assembly of bacterial genomes. Genome Research, 18(2), 324–330.

    Article  PubMed  CAS  Google Scholar 

  21. Li, R., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20(2), 265–272.

    Article  PubMed  CAS  Google Scholar 

  22. Tammi, M. T., et al. (2003). Correcting errors in shotgun sequences. Nucleic Acids Research, 31(15), 4663–4672.

    Article  PubMed  CAS  Google Scholar 

  23. Salmela, L. (2010). Correction of sequencing errors in a mixed set of reads. Bioinformatics, 26(10), 1284–1290.

    Article  PubMed  CAS  Google Scholar 

  24. Schröder, J., et al. (2009). SHREC: A short-read error correction method. Bioinformatics, 25(17), 2157–2163.

    Article  PubMed  Google Scholar 

  25. Quackenbush, J., et al. (2000). The TIGR gene indices: Reconstruction and representation of expressed gene sequences. Nucleic Acids Research, 28(1), 141–145.

    Article  PubMed  CAS  Google Scholar 

  26. Gilchrist, M. J., et al. (2004) Defining a large set of full-length clones from a < i > Xenopus tropicalis </i > EST project. Developmental Biology, 271(2), 498–516.

    Google Scholar 

  27. Pertea, G., et al. (2003). TIGR Gene Indices clustering tools (TGICL): A software system for fast clustering of large EST datasets. Bioinformatics, 19(5), 651–652.

    Article  PubMed  CAS  Google Scholar 

  28. Church, D. M., et al. (2009). Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biology, 7(5), e1000112.

    Article  PubMed  Google Scholar 

  29. Salzberg, S. L., & Yorke, J. A. (2005). Beware of mis-assembled genomes. Bioinformatics, 21(24), 4320–4321.

    Article  PubMed  CAS  Google Scholar 

  30. Choi, J. H., et al. (2008). A machine-learning approach to combined evidence validation of genome assemblies. Bioinformatics, 24(6), 744–750.

    Article  PubMed  CAS  Google Scholar 

  31. Phillippy, A. M., Schatz, M. C., & Pop, M. (2008). Genome assembly forensics: Finding the elusive mis-assembly. Genome Biology, 9(3), R55.

    Article  PubMed  Google Scholar 

  32. Meader, S., et al. (2010). Genome assembly quality: Assessment and improvement using the neutral indel model. Genome Research, 20(5), 675–684.

    Article  PubMed  CAS  Google Scholar 

  33. Myers, E. W., et al. (2000). A whole-genome assembly of Drosophila. Science, 287(5461), 2196–2204.

    Article  PubMed  CAS  Google Scholar 

  34. Huang, X., et al. (2003). PCAP: A whole-genome assembly program. Genome Research, 13(9), 2164–2170.

    Article  PubMed  CAS  Google Scholar 

  35. Zerbino, D. R., & Birney, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18(5), 821–829.

    Article  PubMed  CAS  Google Scholar 

  36. Haiminen, N., et al. (2011). Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results. PLoS One, 6(9), e24182.

    Article  PubMed  CAS  Google Scholar 

  37. Namiki, T., et al. (2011). MetaVelvet: An extension of velvet assembler to de novo metagenome assembly from short sequence reads. Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, ACM.

    Google Scholar 

  38. Ewing, B., & Green, P. (1998). Base-calling of automated sequencer traces using Phred. II. error probabilities. Genome Research, 8(3), 186–194.

    Article  PubMed  CAS  Google Scholar 

  39. Narzisi, G., & Mishra, B. (2011). Comparing de novo genome assembly: The long and short of it. PLoS One, 6(4), e19175.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Masoudi-Nejad .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Masoudi-Nejad, A., Narimani, Z., Hosseinkhan, N. (2013). The Assembly of Sequencing Data. In: Next Generation Sequencing and Sequence Assembly. SpringerBriefs in Systems Biology, vol 4. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7726-6_3

Download citation

Publish with us

Policies and ethics