DNA Sequence Assembly and Annotation of Genes
This chapter describes the different sequencing strategies, the pros and cons of the different strategies to help you select the optimal DNA sequencing strategy for your research question, and how to assembly and annotate DNA sequences. DNA sequencing is the determination of the order of nucleotides of parts or whole chromosomes of organisms and virus. DNA sequencing can be done for a single gene or a whole genome or many genomes at a time such as in metagenomics. One of the most popular sequencing machines is the MiSeq from Illumina which is capable of doing small whole-genome sequencing, transcriptomics, and 16S rRNA metagenomics. It is possible to multiplex by using unique combinations of specific barcodes and indexes. Real-time, single-molecule sequencing allows for sequencing of the native DNA, resulting in significantly longer read lengths and sequence information available when the bases are incorporated, i.e., information available in real time. Base calling is the first step in sequencing where the electronic signal generated in the sequencing machine is separated from random noise and converted to nucleotide information. Then the nucleotide information needs to be assembled to DNA sequences which resemble the original DNA sequenced as best as possible. This can either be done de novo without a reference or with a reference if the genome of the organism or virus is well known. The most important quality parameter to consider is the coverage. Another important parameter is N50. Comparison of different assemblies can be made with Quast. The “minimum information about a genome sequence (MIGS) specification provides an exhaustive list of the information required for genomic sequences including demands to metadata. Genome annotation is the identification and labeling of all the relevant features of the genomic sequence. At first, this includes the coordinates provided as nucleotide positions where coding regions are predicted. It is mainly a prediction of coding genes; however, other structural genes such as rRNA are also identified.
- Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. 2008. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75.CrossRefPubMedPubMedCentralGoogle Scholar
- Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thomson N, Allen MJ, Angiuoli SV, Ashburner M, Axelrod N, Baldauf S, Ballard S, Boore J, Cochrane G, Cole J, Dawyndt P, De Vos P, DePamphilis C, Edwards R, Faruque N, Feldman R, Gilbert J, Gilna P, Glöckner FO, Goldstein P, Guralnick R, Haft D, Hancock D, Hermjakob H, Hertz-Fowler C, Hugenholtz P, Joint I, Kagan L, Kane M, Kennedy J, Kowalchuk G, Kottmann R, Kolker E, Kravitz S, Kyrpides N, Leebens-Mack J, Lewis SE, Li K, Lister AL, Lord P, Maltsev N, Markowitz V, Martiny J, Methe B, Mizrachi I, Moxon R, Nelson K, Parkhill J, Proctor L, White O, Sansone SA, Spiers A, Stevens R, Swift P, Taylor C, Tateno Y, Tett A, Turner S, Ussery D, Vaughan B, Ward N, Whetzel T, San Gil I, Wilson G, Wipat A. 2008. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol. 26, 541–7.CrossRefPubMedPubMedCentralGoogle Scholar
- Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. 2010. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc.Google Scholar
- Madigan M, Bender KS, Buckley DH, Sattley WM, & Stahl D. 2019. Brock biology of Microorganisms. Pearson, Harlow UK.Google Scholar
- Nurk S, Bankevich A, Antipov D, Gurevich AA, Korobeynikov A, Lapidus A, Prjibelski AD, Pyshkin A, Sirotkin A, Sirotkin Y, Stepanauskas R, Clingenpeel SR, Woyke T, McLean JS, Lasken R, Tesler G, Alekseyev MA, Pevzner PA. 2013. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol. 20, 714–37.CrossRefPubMedPubMedCentralGoogle Scholar
- Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, Parrello B, Shukla M, Vonstein V, Wattam AR, Xia F, Stevens R. 2014. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42(Database issue):D206–14.CrossRefPubMedGoogle Scholar