The Assembly of Sequencing Data

Masoudi-Nejad, Ali; Narimani, Zahra; Hosseinkhan, Nazanin

doi:10.1007/978-1-4614-7726-6_3

Ali Masoudi-Nejad⁴,
Zahra Narimani⁴ &
Nazanin Hosseinkhan⁴

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS,volume 4))

3177 Accesses

Abstract

Genome science has progressed greatly in recent years and its potential applications caused scientists to believe that biology will be the foremost science of the twenty-first century. The outcome of genome research projects has a major impact on the life sciences. Being able to gain genome sequences can be helpful for other analyses, such as the detection of single nucleotide polymorphisms (SNPs) and comparative genomic research. There are many potential applications for genome research, including molecular medicine, risk assessment, bioarchaeology, anthropology, evolution and human migration, DNA forensics, and agriculture, livestock breeding, and bioprocessing. These applications will not be successfully processed until we are able to sequence genomes within a reasonable amount of time and cost. Personalized medicine is a promising area that is defined as being based on having individual genomes at hand; it is also a big potential market if we look at sequence assembly from this perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
IlluminaMatePair: http://www.illumina.com/technology/mate_pair_sequencing_assay.ilmn
2.
IlluminaPairedEnd: http://www.illumina.com/technology/paired_end_sequencing_assay.ilmn
3.
http://www.illumina.com/technology/mate_pair_sequencing_assay.ilmn

References

Scheibye-Alsing, K., et al. (2009). Sequence assembly. Computational Biology and Chemistry, 33(2), 121–136.
Article PubMed CAS Google Scholar
Li, H., Ruan, J., & Durbin, R. (2008). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research, 18(11), 1851–1858.
Article PubMed CAS Google Scholar
Li, R., et al. (2008). SOAP: Short oligonucleotide alignment program. Bioinformatics, 24(5), 713–714.
Article PubMed CAS Google Scholar
Rumble, S. M., et al. (2009). SHRiMP: Accurate mapping of short color-space reads. PLoS Computational Biology, 5(5), e1000386.
Article PubMed Google Scholar
Butler, J., et al. (2008). ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Research, 18(5), 810–820.
Article PubMed CAS Google Scholar
Ariyaratne, P. N., & Sung, W. K. (2011). PE-Assembler: De novo assembler using short paired-end reads. Bioinformatics, 27(2), 167–174.
Article PubMed CAS Google Scholar
Huang, X., & Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9(9), 868–877.
Article PubMed CAS Google Scholar
Batzoglou, S., et al. (2002). ARACHNE: A whole-genome shotgun assembler. Genome Research, 12(1), 177–189.
Article PubMed CAS Google Scholar
Simpson, J. T., et al. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117–1123.
Article PubMed CAS Google Scholar
Miller, J. R., Koren, S., & Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics, 95(6), 315.
Article PubMed CAS Google Scholar
Kececioglu, J., & Ju, J. (2001). Separating repeats in DNA sequence assembly. Proceedings of the Fifth Annual International Conference on Computational Biology, ACM.
Google Scholar
Paszkiewicz, K., & Studholme, D. J. (2010). De novo assembly of short sequence reads. Briefings in Bioinformatics, 11(5), 457–472.
Article PubMed CAS Google Scholar
Shi, H., et al. (2010). Quality-score guided error correction for short-read sequencing data using CUDA. Procedia Computer Science, 1(1), 1129–1138.
Article Google Scholar
Cock, P. J. A., et al. (2010). The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 38(6), 1767–1771.
Article PubMed CAS Google Scholar
Warren, R. L., et al. (2007). Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23(4), 500–501.
Article PubMed CAS Google Scholar
Dohm, J. C., et al. (2007). SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Research, 17(11), 1697–1706.
Article PubMed CAS Google Scholar
Smit, A., Hubley, R., & Green, P. (2004). RepeatMasker Open-3.0 1996–2004. Institute for Systems Biology, Seattle.
Google Scholar
Chaisson, M. J., Brinza, D., & Pevzner, P. A. (2009). De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research, 19(2), 336–346.
Article PubMed CAS Google Scholar
Pevzner, P. A., Tang, H., & Waterman, M. S. (2001). An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences, 98(17), 9748–9753.
Article CAS Google Scholar
Chaisson, M. J., & Pevzner, P. A. (2008). Short read fragment assembly of bacterial genomes. Genome Research, 18(2), 324–330.
Article PubMed CAS Google Scholar
Li, R., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20(2), 265–272.
Article PubMed CAS Google Scholar
Tammi, M. T., et al. (2003). Correcting errors in shotgun sequences. Nucleic Acids Research, 31(15), 4663–4672.
Article PubMed CAS Google Scholar
Salmela, L. (2010). Correction of sequencing errors in a mixed set of reads. Bioinformatics, 26(10), 1284–1290.
Article PubMed CAS Google Scholar
Schröder, J., et al. (2009). SHREC: A short-read error correction method. Bioinformatics, 25(17), 2157–2163.
Article PubMed Google Scholar
Quackenbush, J., et al. (2000). The TIGR gene indices: Reconstruction and representation of expressed gene sequences. Nucleic Acids Research, 28(1), 141–145.
Article PubMed CAS Google Scholar
Gilchrist, M. J., et al. (2004) Defining a large set of full-length clones from a < i > Xenopus tropicalis </i > EST project. Developmental Biology, 271(2), 498–516.
Google Scholar
Pertea, G., et al. (2003). TIGR Gene Indices clustering tools (TGICL): A software system for fast clustering of large EST datasets. Bioinformatics, 19(5), 651–652.
Article PubMed CAS Google Scholar
Church, D. M., et al. (2009). Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biology, 7(5), e1000112.
Article PubMed Google Scholar
Salzberg, S. L., & Yorke, J. A. (2005). Beware of mis-assembled genomes. Bioinformatics, 21(24), 4320–4321.
Article PubMed CAS Google Scholar
Choi, J. H., et al. (2008). A machine-learning approach to combined evidence validation of genome assemblies. Bioinformatics, 24(6), 744–750.
Article PubMed CAS Google Scholar
Phillippy, A. M., Schatz, M. C., & Pop, M. (2008). Genome assembly forensics: Finding the elusive mis-assembly. Genome Biology, 9(3), R55.
Article PubMed Google Scholar
Meader, S., et al. (2010). Genome assembly quality: Assessment and improvement using the neutral indel model. Genome Research, 20(5), 675–684.
Article PubMed CAS Google Scholar
Myers, E. W., et al. (2000). A whole-genome assembly of Drosophila. Science, 287(5461), 2196–2204.
Article PubMed CAS Google Scholar
Huang, X., et al. (2003). PCAP: A whole-genome assembly program. Genome Research, 13(9), 2164–2170.
Article PubMed CAS Google Scholar
Zerbino, D. R., & Birney, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18(5), 821–829.
Article PubMed CAS Google Scholar
Haiminen, N., et al. (2011). Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results. PLoS One, 6(9), e24182.
Article PubMed CAS Google Scholar
Namiki, T., et al. (2011). MetaVelvet: An extension of velvet assembler to de novo metagenome assembly from short sequence reads. Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, ACM.
Google Scholar
Ewing, B., & Green, P. (1998). Base-calling of automated sequencer traces using Phred. II. error probabilities. Genome Research, 8(3), 186–194.
Article PubMed CAS Google Scholar
Narzisi, G., & Mishra, B. (2011). Comparing de novo genome assembly: The long and short of it. PLoS One, 6(4), e19175.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
Ali Masoudi-Nejad, Zahra Narimani & Nazanin Hosseinkhan

Authors

Ali Masoudi-Nejad
View author publications
You can also search for this author in PubMed Google Scholar
Zahra Narimani
View author publications
You can also search for this author in PubMed Google Scholar
Nazanin Hosseinkhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Masoudi-Nejad .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Masoudi-Nejad, A., Narimani, Z., Hosseinkhan, N. (2013). The Assembly of Sequencing Data. In: Next Generation Sequencing and Sequence Assembly. SpringerBriefs in Systems Biology, vol 4. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7726-6_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7726-6_3
Published: 09 July 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7725-9
Online ISBN: 978-1-4614-7726-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics