Skip to main content
Log in

Achieving Accurate Sequence and Annotation Data for Caulobacter vibrioides CB13

  • Published:
Current Microbiology Aims and scope Submit manuscript

Abstract

Annotated sequence data are instrumental in nearly all realms of biology. However, the advent of next-generation sequencing has rapidly facilitated an imbalance between accurate sequence data and accurate annotation data. To increase the annotation accuracy of the Caulobacter vibrioides CB13b1a (CB13) genome, we compared the PGAP and RAST annotations of the CB13 genome. A total of 64 unique genes were identified in the PGAP annotation that were either completely or partially absent in the RAST annotation, and a total of 16 genes were identified in the RAST annotation that were not included in the PGAP annotation. Moreover, PGAP identified 73 frameshifted genes and 22 genes with an internal stop. In contrast, RAST annotated the larger segment of these frameshifted genes without indicating a change in reading frame may have occurred. The RAST annotation did not include any genes with internal stop codons, since it chose start codons that were after the internal stop. To confirm the discrepancies between the two annotations and verify the accuracy of the CB13 genome sequence data, we re-sequenced and re-annotated the entire genome and obtained an identical sequence, except in a small number of homopolymer regions. A genome sequence comparison between the two versions allowed us to determine the correct number of bases in each homopolymer region, which eliminated frameshifts for 31 genes annotated as frameshifted genes and removed 24 pseudogenes from the PGAP annotation. Both annotation systems correctly identified genes that were missed by the other system. In addition, PGAP identified conserved gene fragments that represented the beginning of genes, but it employed no corrective method to adjust the reading frame of frameshifted genes or the start sites of genes harboring an internal stop codon. In doing so, the PGAP annotation identified a large number of pseudogenes, which may reflect evolutionary history but likely do not produce gene products. These results demonstrate that re-sequencing and annotation comparisons can be used to increase the accuracy of genomic data and the corresponding gene annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Aziz RK, Bartels D, Best AA et al (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75

    Article  Google Scholar 

  2. Christen B, Abeliuk E, Collier JM et al (2011) The essential genome of a bacterium. Mol Syst Biol 7:528

    Article  Google Scholar 

  3. da Silva CA, Lourenço RF, Mazzon RR et al (2016) Transcriptomic analysis of the stationary phase response regulator SpdR in Caulobacter crescentus. BMC Microbiol 16:66

    Article  Google Scholar 

  4. Darling AE, Mau B, Perna NT (2010) progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5:e11147

    Article  Google Scholar 

  5. Darling AE, Tritt A, Eisen JA et al (2011) Mauve assembly metrics. Bioinformatics 27:2756–2757

    Article  CAS  Google Scholar 

  6. Ely B, Scott LE (2014) Correction of the Caulobacter crescentus NA1000 genome annotation. PLoS ONE 9:e91668

    Article  Google Scholar 

  7. Kislyuk AO, Katz LS, Agrawal S et al (2010) A computational genomics pipeline for prokaryotic sequencing projects. Bioinformatics 26:1819–1826

    Article  CAS  Google Scholar 

  8. Marks ME, Castro-Rojas CM, Teiling C et al (2010) The genetic basis of laboratory adaptation in Caulobacter crescentus. J Bacteriol 192:3678–3688

    Article  CAS  Google Scholar 

  9. Nielsen P, Krogh A (2005) Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics 21:4322–4329

    Article  CAS  Google Scholar 

  10. Overbeek R, Olson R, Pusch GD et al (2013) The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res 42:D206–D214

    Article  Google Scholar 

  11. Pruitt KD, Tatusova T, Brown GR et al (2011) NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40:D130–D135

    Article  Google Scholar 

  12. Rutherford K, Parkhill J, Crook J et al (2000) Artemis: sequence visualization and annotation. Bioinformatics 10:944–945

    Article  Google Scholar 

  13. Schrader JM, Li GW, Childers WS et al (2016) Dynamic translation regulation in Caulobacter cell cycle control. Proc Natl Acad Sci 113:E6859–E6867

    Article  CAS  Google Scholar 

  14. Scott D, Ely B (2015) Comparison of genome sequencing technology and assembly methods for the analysis of a GC-rich bacterial genome. Curr Microbiol 70:338–344

    Article  CAS  Google Scholar 

  15. Shin SC, Ahndo H, Kim SJ et al (2013) Advantages of single-molecule real-time sequencing in high-GC content genomes. PLoS ONE 8:e68824

    Article  CAS  Google Scholar 

  16. Tatusova T, DiCuccio M, Badretdin A et al (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44:6614–6624

    Article  CAS  Google Scholar 

Download references

Funding

This work was funded in part by the National Institutes of Health Grant R25GM076277 to BE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bert Ely.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 27 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Berrios, L., Ely, B. Achieving Accurate Sequence and Annotation Data for Caulobacter vibrioides CB13. Curr Microbiol 75, 1642–1648 (2018). https://doi.org/10.1007/s00284-018-1572-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00284-018-1572-3

Navigation