Abstract
Annotation of prokaryotic sequences can be separated into structural and functional annotation. Structural annotation is dependent on algorithmic interrogation of experimental evidence to discover the physical characteristics of a gene. This is done in an effort to construct accurate gene models, so understanding function or evolution of genes among organisms is not impeded. Functional annotation is dependent on sequence similarity to other known genes or proteins in an effort to assess the function of the gene. Combining structural and functional annotation across genomes in a comparative manner promotes higher levels of accurate annotation as well as an advanced understanding of genome evolution. As the availability of bacterial sequences increases and annotation methods improve, the value of comparative annotation will increase.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ, Hooper SD, Pati A, Lykidis A, Spring S, Anderson IJ, D'haeseleer P, Zemla A, Singer M, Lapidus A, Nolan M, Copeland A, Han C, Chen F, Cheng JF, Lucas S, Kerfeld C, Lang E, Gronow S, Chain P, Bruce D, Rubin EM, Kyrpides NC, Klenk HP, Eisen JA (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462(7276):1056–1060
Abbott JC (2005) WebACT–an online companion for the Artemis Comparison Tool. Bioinformatics 21:3665–3666
Ouyang S, Thibaud-Nissen F, Childs KL, Zhu W, Buell CR (2009) Plant genome annotation methods. Methods Mol Biol 513:263–282
Chain PS, Grafham DV, Fulton RS, Fitzgerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, Cole JR, Ding Y, Dugan S, Field D, Garrity GM, Gibbs R, Graves T, Han CS, Harrison SH, Highlander S, Hugenholtz P, Khouri HM, Kodira CD, Kolker E, Kyrpides NC, Lang D, Lapidus A, Malfatti SA, Markowitz V, Metha T, Nelson KE, Parkhill J, Pitluck S, Qin X, Read TD, Schmutz J, Sozhamannan S, Sterk P, Strausberg RL, Sutton G, Thomson NR, Tiedje JM, Weinstock G, Wollam A, Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium, Detter JC (2009) Genome project standards in a new era of sequencing. Science 326:236–237
Voelkerding K, Dames S, Durtschi J (2009) Next-generation sequencing from basic research to diagnostics (Reviews). Clin Chem 658:641–658
McHardy AC (2004) Development of joint application strategies for two microbial gene finders. Bioinformatics 20:1622–1631
Badger J, Olsen G (1996) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16:512–524
Staden R (1984) Graphic methods to determine the functoin of nucleic acid sequences. Nucleic Acids Res 12:521–538
Overbeek R, Bartels D, Vonstein V, Meyer F (2007) Annotation of bacterial and archael genomes: improving accuracy and consistency. Chem Rev 107:3431–3447
Yada T, Totoki Y, Takagi T, Nakai K (2001) A novel bacterial gene-finding system with improved accuracy in locating start codons. DNA Res 8:97–106
Zhu HQ (2004) Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics 20:3308–3317
Salzberg SL, Delcher AL, Kasif S et al (1998) Microbial gene identification using interpolated Markov Models. Nucleic Acids Res 26:544–548
Lowe TM, Eddy SR (1997) tRNA-scan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and trnaslation initiation site identification. BMC Bioinformatics 11:119–130
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618
Stein L (2001) Genome annotation: from sequence to biology. Nat Rev Genet 2:493–503
Médigue C, Moszer I (2007) Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol 158:724–736
Starkenburg SR, Chain PSG, Sayavedra-Soto LA, Hauser L, Land ML, Larimer FW, Malfatti SA, Klotz MG, Bottomley PJ, Arp DJ, Hickey WJ (2006) Genome sequence of the chemolithoautotrophic nitrite-oxidizing bacterium Nitrobacter winogradskyi Nb-255. Appl Environ Microbiol 72:2050–2063
Altschul S, Koonin E (1998) Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem Soc 23:444–447
Schneider M, Tognolli M, Bairoch A (2004) The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. Plant Physiol Biochem 42:1013–1021
Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405–420
Zdobnov EM, Apweiler R (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 1:25–29
Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Karp PD (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33:6083–6089
McGarvey PB, Zhang J, Natale DA, Wu CH, Huang H (2011) Protein-centric data integration for functional analysis of comparative proteomics data. Methods Mol Biol 694:323–339
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33:247–251
Thomas PD (2003) PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification. Nucleic Acids Res 31:334–341
von Heijne G (1986) A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 14:4683–4690
Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580
Gaasterland T, Sensen CW (1996) Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. Biochimie 78:302–310
Scharf M, Schneider R, Casari G, Bork P, Valencia A, Ouzounis C, Sander C (1994) GeneQuiz: a workbench for sequence analysis. Proc Int Conf Intell Syst Mol Biol 2:348–353
Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Rajandream MA (2008) Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics 24:2672–2676
Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS (2005) BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33:W455–W459
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75
Markowitz VM, Chen IMA, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Anderson I, Lykidis A, Mavromatis K, Ivanova NN, Kyrpides NC (2009) The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 38:D382–D390
Markowitz VM, Mavromatis K, Ivanova NN, Chen IMA, Chu K, Kyrpides NC (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25:2271–2278
Orvis J, Crabtree J, Galens K, Gussman A, Inman JM, Lee E, Nampally S, Riley D, Sundaram JP, Felix V, Whitty B, Mahurkar A, Wortman J, White O, Angiuoli SV (2010) Ergatis: a web interface and scalable software system for bioinformatics workflows. Bioinformatics 26:1488–1492
Kislyuk AO, Katz LS, Agrawal S, Hagen MS, Conley AB, Jayaraman P, Nelakuditi V, Humphrey JC, Sammons SA, Govil D, Mair RD, Tatti KM, Tondella ML, Harcourt BH, Mayer LW, Jordan IK (2010) A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26:1819–1826
Meyer F, Goesmann A, Mchardy AC, Bartels D, Bekel T, Clausen È, Kalinowski È, Linke B, Rupp O, Giegerich R (2003) GenDBÐan open source genome annotation system for prokaryote genomes. Nucleic Acids Res 31:2187–2195
Carver TJ (2005) ACT: the Artemis comparison tool. Bioinformatics 21:3422–3423
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12
Goll J, Rusch DB, Tanenbaum DM, Thiagarajan M, Li K, Methé BA, Yooseph S (2010) METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26:2631–2632
Bakke P, Carney N, Deloache W, Gearing M, Ingvorsen K, Lotz M, Mcnair J, Penumetcha P, Simpson S, Voss L, Win M, Heyer LJ, Malcolm A (2009) Evaluation of three automated genome annotations for Halorhabdus utahensis. PLoS One 4:e6291
Acknowledgments
We thank all members of the B6 Genome Science group for their contributions to the establishment of standardized methods, development of software and processes, and genome projects described in this chapter. This study was supported in part by the US Department of Energy Joint Genome Institute through the Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231 and grants from NIH (Y1-DE-6006-02), the US Department of Homeland Security under contract number HSHQDC08X00790, the US Defense Threat Reduction Agency under contract numbers B104153I and B084531I, and LANL Laboratory-Directed Research and Development under grant number (20110051DR).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Beckloff, N., Starkenburg, S., Freitas, T., Chain, P. (2012). Bacterial Genome Annotation. In: Navid, A. (eds) Microbial Systems Biology. Methods in Molecular Biology, vol 881. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-827-6_16
Download citation
DOI: https://doi.org/10.1007/978-1-61779-827-6_16
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-826-9
Online ISBN: 978-1-61779-827-6
eBook Packages: Springer Protocols