Skip to main content

Bacterial Genome Annotation

  • Protocol
  • First Online:
Microbial Systems Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 881))

Abstract

Annotation of prokaryotic sequences can be separated into structural and functional annotation. Structural annotation is dependent on algorithmic interrogation of experimental evidence to discover the physical characteristics of a gene. This is done in an effort to construct accurate gene models, so understanding function or evolution of genes among organisms is not impeded. Functional annotation is dependent on sequence similarity to other known genes or proteins in an effort to assess the function of the gene. Combining structural and functional annotation across genomes in a comparative manner promotes higher levels of accurate annotation as well as an advanced understanding of genome evolution. As the availability of bacterial sequences increases and annotation methods improve, the value of comparative annotation will increase.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ, Hooper SD, Pati A, Lykidis A, Spring S, Anderson IJ, D'haeseleer P, Zemla A, Singer M, Lapidus A, Nolan M, Copeland A, Han C, Chen F, Cheng JF, Lucas S, Kerfeld C, Lang E, Gronow S, Chain P, Bruce D, Rubin EM, Kyrpides NC, Klenk HP, Eisen JA (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462(7276):1056–1060

    Article  PubMed  CAS  Google Scholar 

  2. Abbott JC (2005) WebACT–an online companion for the Artemis Comparison Tool. Bioinformatics 21:3665–3666

    Article  PubMed  CAS  Google Scholar 

  3. Ouyang S, Thibaud-Nissen F, Childs KL, Zhu W, Buell CR (2009) Plant genome annotation methods. Methods Mol Biol 513:263–282

    Article  PubMed  CAS  Google Scholar 

  4. Chain PS, Grafham DV, Fulton RS, Fitzgerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, Cole JR, Ding Y, Dugan S, Field D, Garrity GM, Gibbs R, Graves T, Han CS, Harrison SH, Highlander S, Hugenholtz P, Khouri HM, Kodira CD, Kolker E, Kyrpides NC, Lang D, Lapidus A, Malfatti SA, Markowitz V, Metha T, Nelson KE, Parkhill J, Pitluck S, Qin X, Read TD, Schmutz J, Sozhamannan S, Sterk P, Strausberg RL, Sutton G, Thomson NR, Tiedje JM, Weinstock G, Wollam A, Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium, Detter JC (2009) Genome project standards in a new era of sequencing. Science 326:236–237

    Article  PubMed  CAS  Google Scholar 

  5. Voelkerding K, Dames S, Durtschi J (2009) Next-generation sequencing from basic research to diagnostics (Reviews). Clin Chem 658:641–658

    Article  Google Scholar 

  6. McHardy AC (2004) Development of joint application strategies for two microbial gene finders. Bioinformatics 20:1622–1631

    Article  PubMed  CAS  Google Scholar 

  7. Badger J, Olsen G (1996) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16:512–524

    Article  Google Scholar 

  8. Staden R (1984) Graphic methods to determine the functoin of nucleic acid sequences. Nucleic Acids Res 12:521–538

    Article  PubMed  CAS  Google Scholar 

  9. Overbeek R, Bartels D, Vonstein V, Meyer F (2007) Annotation of bacterial and archael genomes: improving accuracy and consistency. Chem Rev 107:3431–3447

    Article  PubMed  CAS  Google Scholar 

  10. Yada T, Totoki Y, Takagi T, Nakai K (2001) A novel bacterial gene-finding system with improved accuracy in locating start codons. DNA Res 8:97–106

    Article  PubMed  CAS  Google Scholar 

  11. Zhu HQ (2004) Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics 20:3308–3317

    Article  PubMed  CAS  Google Scholar 

  12. Salzberg SL, Delcher AL, Kasif S et al (1998) Microbial gene identification using interpolated Markov Models. Nucleic Acids Res 26:544–548

    Article  PubMed  CAS  Google Scholar 

  13. Lowe TM, Eddy SR (1997) tRNA-scan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964

    PubMed  CAS  Google Scholar 

  14. Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454

    Article  PubMed  CAS  Google Scholar 

  15. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and trnaslation initiation site identification. BMC Bioinformatics 11:119–130

    Article  PubMed  Google Scholar 

  16. Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618

    Article  PubMed  CAS  Google Scholar 

  17. Stein L (2001) Genome annotation: from sequence to biology. Nat Rev Genet 2:493–503

    Article  PubMed  CAS  Google Scholar 

  18. Médigue C, Moszer I (2007) Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol 158:724–736

    Article  PubMed  Google Scholar 

  19. Starkenburg SR, Chain PSG, Sayavedra-Soto LA, Hauser L, Land ML, Larimer FW, Malfatti SA, Klotz MG, Bottomley PJ, Arp DJ, Hickey WJ (2006) Genome sequence of the chemolithoautotrophic nitrite-oxidizing bacterium Nitrobacter winogradskyi Nb-255. Appl Environ Microbiol 72:2050–2063

    Article  PubMed  CAS  Google Scholar 

  20. Altschul S, Koonin E (1998) Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem Soc 23:444–447

    Article  CAS  Google Scholar 

  21. Schneider M, Tognolli M, Bairoch A (2004) The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. Plant Physiol Biochem 42:1013–1021

    Article  PubMed  CAS  Google Scholar 

  22. Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405–420

    Article  PubMed  CAS  Google Scholar 

  23. Zdobnov EM, Apweiler R (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848

    Article  PubMed  CAS  Google Scholar 

  24. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 1:25–29

    Google Scholar 

  25. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30

    Article  PubMed  CAS  Google Scholar 

  26. Karp PD (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33:6083–6089

    Article  PubMed  CAS  Google Scholar 

  27. McGarvey PB, Zhang J, Natale DA, Wu CH, Huang H (2011) Protein-centric data integration for functional analysis of comparative proteomics data. Methods Mol Biol 694:323–339

    Article  PubMed  CAS  Google Scholar 

  28. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540

    PubMed  CAS  Google Scholar 

  29. Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33:247–251

    Article  Google Scholar 

  30. Thomas PD (2003) PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification. Nucleic Acids Res 31:334–341

    Article  PubMed  CAS  Google Scholar 

  31. von Heijne G (1986) A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 14:4683–4690

    Article  Google Scholar 

  32. Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580

    Article  PubMed  CAS  Google Scholar 

  33. Gaasterland T, Sensen CW (1996) Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. Biochimie 78:302–310

    Article  PubMed  CAS  Google Scholar 

  34. Scharf M, Schneider R, Casari G, Bork P, Valencia A, Ouzounis C, Sander C (1994) GeneQuiz: a workbench for sequence analysis. Proc Int Conf Intell Syst Mol Biol 2:348–353

    PubMed  CAS  Google Scholar 

  35. Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Rajandream MA (2008) Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics 24:2672–2676

    Article  PubMed  CAS  Google Scholar 

  36. Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS (2005) BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33:W455–W459

    Article  PubMed  Google Scholar 

  37. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75

    Article  PubMed  Google Scholar 

  38. Markowitz VM, Chen IMA, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Anderson I, Lykidis A, Mavromatis K, Ivanova NN, Kyrpides NC (2009) The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 38:D382–D390

    Article  PubMed  Google Scholar 

  39. Markowitz VM, Mavromatis K, Ivanova NN, Chen IMA, Chu K, Kyrpides NC (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25:2271–2278

    Article  PubMed  CAS  Google Scholar 

  40. Orvis J, Crabtree J, Galens K, Gussman A, Inman JM, Lee E, Nampally S, Riley D, Sundaram JP, Felix V, Whitty B, Mahurkar A, Wortman J, White O, Angiuoli SV (2010) Ergatis: a web interface and scalable software system for bioinformatics workflows. Bioinformatics 26:1488–1492

    Article  PubMed  CAS  Google Scholar 

  41. Kislyuk AO, Katz LS, Agrawal S, Hagen MS, Conley AB, Jayaraman P, Nelakuditi V, Humphrey JC, Sammons SA, Govil D, Mair RD, Tatti KM, Tondella ML, Harcourt BH, Mayer LW, Jordan IK (2010) A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26:1819–1826

    Article  PubMed  CAS  Google Scholar 

  42. Meyer F, Goesmann A, Mchardy AC, Bartels D, Bekel T, Clausen È, Kalinowski È, Linke B, Rupp O, Giegerich R (2003) GenDBÐan open source genome annotation system for prokaryote genomes. Nucleic Acids Res 31:2187–2195

    Article  PubMed  CAS  Google Scholar 

  43. Carver TJ (2005) ACT: the Artemis comparison tool. Bioinformatics 21:3422–3423

    Article  PubMed  CAS  Google Scholar 

  44. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12

    Article  PubMed  Google Scholar 

  45. Goll J, Rusch DB, Tanenbaum DM, Thiagarajan M, Li K, Methé BA, Yooseph S (2010) METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26:2631–2632

    Article  PubMed  CAS  Google Scholar 

  46. Bakke P, Carney N, Deloache W, Gearing M, Ingvorsen K, Lotz M, Mcnair J, Penumetcha P, Simpson S, Voss L, Win M, Heyer LJ, Malcolm A (2009) Evaluation of three automated genome annotations for Halorhabdus utahensis. PLoS One 4:e6291

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

We thank all members of the B6 Genome Science group for their contributions to the establishment of standardized methods, development of software and processes, and genome projects described in this chapter. This study was supported in part by the US Department of Energy Joint Genome Institute through the Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231 and grants from NIH (Y1-DE-6006-02), the US Department of Homeland Security under contract number HSHQDC08X00790, the US Defense Threat Reduction Agency under contract numbers B104153I and B084531I, and LANL Laboratory-Directed Research and Development under grant number (20110051DR).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Chain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Beckloff, N., Starkenburg, S., Freitas, T., Chain, P. (2012). Bacterial Genome Annotation. In: Navid, A. (eds) Microbial Systems Biology. Methods in Molecular Biology, vol 881. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-827-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-827-6_16

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-826-9

  • Online ISBN: 978-1-61779-827-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics