Bacterial Genome Annotation

Beckloff, Nicholas; Starkenburg, Shawn; Freitas, Tracey; Chain, Patrick

doi:10.1007/978-1-61779-827-6_16

Nicholas Beckloff²,
Shawn Starkenburg²,
Tracey Freitas² &
…
Patrick Chain²

Part of the book series: Methods in Molecular Biology ((MIMB,volume 881))

4797 Accesses
6 Citations

Abstract

Annotation of prokaryotic sequences can be separated into structural and functional annotation. Structural annotation is dependent on algorithmic interrogation of experimental evidence to discover the physical characteristics of a gene. This is done in an effort to construct accurate gene models, so understanding function or evolution of genes among organisms is not impeded. Functional annotation is dependent on sequence similarity to other known genes or proteins in an effort to assess the function of the gene. Combining structural and functional annotation across genomes in a comparative manner promotes higher levels of accurate annotation as well as an advanced understanding of genome evolution. As the availability of bacterial sequences increases and annotation methods improve, the value of comparative annotation will increase.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ, Hooper SD, Pati A, Lykidis A, Spring S, Anderson IJ, D'haeseleer P, Zemla A, Singer M, Lapidus A, Nolan M, Copeland A, Han C, Chen F, Cheng JF, Lucas S, Kerfeld C, Lang E, Gronow S, Chain P, Bruce D, Rubin EM, Kyrpides NC, Klenk HP, Eisen JA (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462(7276):1056–1060
Article PubMed CAS Google Scholar
Abbott JC (2005) WebACT–an online companion for the Artemis Comparison Tool. Bioinformatics 21:3665–3666
Article PubMed CAS Google Scholar
Ouyang S, Thibaud-Nissen F, Childs KL, Zhu W, Buell CR (2009) Plant genome annotation methods. Methods Mol Biol 513:263–282
Article PubMed CAS Google Scholar
Chain PS, Grafham DV, Fulton RS, Fitzgerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, Cole JR, Ding Y, Dugan S, Field D, Garrity GM, Gibbs R, Graves T, Han CS, Harrison SH, Highlander S, Hugenholtz P, Khouri HM, Kodira CD, Kolker E, Kyrpides NC, Lang D, Lapidus A, Malfatti SA, Markowitz V, Metha T, Nelson KE, Parkhill J, Pitluck S, Qin X, Read TD, Schmutz J, Sozhamannan S, Sterk P, Strausberg RL, Sutton G, Thomson NR, Tiedje JM, Weinstock G, Wollam A, Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium, Detter JC (2009) Genome project standards in a new era of sequencing. Science 326:236–237
Article PubMed CAS Google Scholar
Voelkerding K, Dames S, Durtschi J (2009) Next-generation sequencing from basic research to diagnostics (Reviews). Clin Chem 658:641–658
Article Google Scholar
McHardy AC (2004) Development of joint application strategies for two microbial gene finders. Bioinformatics 20:1622–1631
Article PubMed CAS Google Scholar
Badger J, Olsen G (1996) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16:512–524
Article Google Scholar
Staden R (1984) Graphic methods to determine the functoin of nucleic acid sequences. Nucleic Acids Res 12:521–538
Article PubMed CAS Google Scholar
Overbeek R, Bartels D, Vonstein V, Meyer F (2007) Annotation of bacterial and archael genomes: improving accuracy and consistency. Chem Rev 107:3431–3447
Article PubMed CAS Google Scholar
Yada T, Totoki Y, Takagi T, Nakai K (2001) A novel bacterial gene-finding system with improved accuracy in locating start codons. DNA Res 8:97–106
Article PubMed CAS Google Scholar
Zhu HQ (2004) Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics 20:3308–3317
Article PubMed CAS Google Scholar
Salzberg SL, Delcher AL, Kasif S et al (1998) Microbial gene identification using interpolated Markov Models. Nucleic Acids Res 26:544–548
Article PubMed CAS Google Scholar
Lowe TM, Eddy SR (1997) tRNA-scan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964
PubMed CAS Google Scholar
Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454
Article PubMed CAS Google Scholar
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and trnaslation initiation site identification. BMC Bioinformatics 11:119–130
Article PubMed Google Scholar
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618
Article PubMed CAS Google Scholar
Stein L (2001) Genome annotation: from sequence to biology. Nat Rev Genet 2:493–503
Article PubMed CAS Google Scholar
Médigue C, Moszer I (2007) Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol 158:724–736
Article PubMed Google Scholar
Starkenburg SR, Chain PSG, Sayavedra-Soto LA, Hauser L, Land ML, Larimer FW, Malfatti SA, Klotz MG, Bottomley PJ, Arp DJ, Hickey WJ (2006) Genome sequence of the chemolithoautotrophic nitrite-oxidizing bacterium Nitrobacter winogradskyi Nb-255. Appl Environ Microbiol 72:2050–2063
Article PubMed CAS Google Scholar
Altschul S, Koonin E (1998) Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem Soc 23:444–447
Article CAS Google Scholar
Schneider M, Tognolli M, Bairoch A (2004) The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. Plant Physiol Biochem 42:1013–1021
Article PubMed CAS Google Scholar
Sonnhammer EL, Eddy SR, Durbin R (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28:405–420
Article PubMed CAS Google Scholar
Zdobnov EM, Apweiler R (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848
Article PubMed CAS Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 1:25–29
Google Scholar
Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Article PubMed CAS Google Scholar
Karp PD (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33:6083–6089
Article PubMed CAS Google Scholar
McGarvey PB, Zhang J, Natale DA, Wu CH, Huang H (2011) Protein-centric data integration for functional analysis of comparative proteomics data. Methods Mol Biol 694:323–339
Article PubMed CAS Google Scholar
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
PubMed CAS Google Scholar
Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 33:247–251
Article Google Scholar
Thomas PD (2003) PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification. Nucleic Acids Res 31:334–341
Article PubMed CAS Google Scholar
von Heijne G (1986) A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 14:4683–4690
Article Google Scholar
Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305:567–580
Article PubMed CAS Google Scholar
Gaasterland T, Sensen CW (1996) Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. Biochimie 78:302–310
Article PubMed CAS Google Scholar
Scharf M, Schneider R, Casari G, Bork P, Valencia A, Ouzounis C, Sander C (1994) GeneQuiz: a workbench for sequence analysis. Proc Int Conf Intell Syst Mol Biol 2:348–353
PubMed CAS Google Scholar
Carver T, Berriman M, Tivey A, Patel C, Bohme U, Barrell BG, Parkhill J, Rajandream MA (2008) Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics 24:2672–2676
Article PubMed CAS Google Scholar
Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS (2005) BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33:W455–W459
Article PubMed Google Scholar
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics 9:75
Article PubMed Google Scholar
Markowitz VM, Chen IMA, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Anderson I, Lykidis A, Mavromatis K, Ivanova NN, Kyrpides NC (2009) The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 38:D382–D390
Article PubMed Google Scholar
Markowitz VM, Mavromatis K, Ivanova NN, Chen IMA, Chu K, Kyrpides NC (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25:2271–2278
Article PubMed CAS Google Scholar
Orvis J, Crabtree J, Galens K, Gussman A, Inman JM, Lee E, Nampally S, Riley D, Sundaram JP, Felix V, Whitty B, Mahurkar A, Wortman J, White O, Angiuoli SV (2010) Ergatis: a web interface and scalable software system for bioinformatics workflows. Bioinformatics 26:1488–1492
Article PubMed CAS Google Scholar
Kislyuk AO, Katz LS, Agrawal S, Hagen MS, Conley AB, Jayaraman P, Nelakuditi V, Humphrey JC, Sammons SA, Govil D, Mair RD, Tatti KM, Tondella ML, Harcourt BH, Mayer LW, Jordan IK (2010) A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26:1819–1826
Article PubMed CAS Google Scholar
Meyer F, Goesmann A, Mchardy AC, Bartels D, Bekel T, Clausen È, Kalinowski È, Linke B, Rupp O, Giegerich R (2003) GenDBÐan open source genome annotation system for prokaryote genomes. Nucleic Acids Res 31:2187–2195
Article PubMed CAS Google Scholar
Carver TJ (2005) ACT: the Artemis comparison tool. Bioinformatics 21:3422–3423
Article PubMed CAS Google Scholar
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12
Article PubMed Google Scholar
Goll J, Rusch DB, Tanenbaum DM, Thiagarajan M, Li K, Methé BA, Yooseph S (2010) METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26:2631–2632
Article PubMed CAS Google Scholar
Bakke P, Carney N, Deloache W, Gearing M, Ingvorsen K, Lotz M, Mcnair J, Penumetcha P, Simpson S, Voss L, Win M, Heyer LJ, Malcolm A (2009) Evaluation of three automated genome annotations for Halorhabdus utahensis. PLoS One 4:e6291
Article PubMed Google Scholar

Download references

Acknowledgments

We thank all members of the B6 Genome Science group for their contributions to the establishment of standardized methods, development of software and processes, and genome projects described in this chapter. This study was supported in part by the US Department of Energy Joint Genome Institute through the Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231 and grants from NIH (Y1-DE-6006-02), the US Department of Homeland Security under contract number HSHQDC08X00790, the US Defense Threat Reduction Agency under contract numbers B104153I and B084531I, and LANL Laboratory-Directed Research and Development under grant number (20110051DR).

Author information

Authors and Affiliations

Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
Nicholas Beckloff, Shawn Starkenburg, Tracey Freitas & Patrick Chain

Authors

Nicholas Beckloff
View author publications
You can also search for this author in PubMed Google Scholar
Shawn Starkenburg
View author publications
You can also search for this author in PubMed Google Scholar
Tracey Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Chain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Chain .

Editor information

Editors and Affiliations

Physics & Life Sciences Directorate, Biosciences & Biotechnology Div., Lawrence Livermore National Laboratory, East Avenue 7000, Livermore, 94551, California, USA
Ali Navid

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Beckloff, N., Starkenburg, S., Freitas, T., Chain, P. (2012). Bacterial Genome Annotation. In: Navid, A. (eds) Microbial Systems Biology. Methods in Molecular Biology, vol 881. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-827-6_16

Download citation

DOI: https://doi.org/10.1007/978-1-61779-827-6_16
Published: 20 April 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-826-9
Online ISBN: 978-1-61779-827-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics