Skip to main content

Genomic Analysis of Pure Cultures and Communities

  • Protocol
  • First Online:
Book cover Hydrocarbon and Lipid Microbiology Protocols

Abstract

Oil-degrading bacteria and their communities have been in focus of the research for the past few decades for a number of reasons. First, this allows filling the voids in our knowledge on the major mechanisms facilitating the oil biodegradation, to identify the key organisms playing significant roles in these processes and, furthermore, to learn how to effectively manage their performance in situ to enhance the rates of biodegradation. Historically, of a particular interest for genomics studies were the so-called marine hydrocarbonoclastic bacteria, the petroleum biodegradation specialists with very restricted substrate profiles. Apart from their utility in environmental cleanup, oil-degrading bacteria possess an array of enzymes and pathways of a great potential for further biotechnological applications: biopolymers production, oxidation-reduction reactions, chiral synthesis, biosurfactant production, etc. In this chapter we describe current methods for genome and metagenome sequencing and annotation. Importantly, these are not limited to a particular group of microorganisms and are thus almost universally applicable. We focused exclusively on the methods and tools that everyone could use on a non-commercial basis. Due to the availability of numerous alternative methods and approaches, we have arbitrarily chosen reliable protocols that can be used by a common biologist without a great deal of computational biology background.

Author contributed equally with all other contributors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schneiker S, Martins dos Santos VAP, Bartels D, Bekel T, Brecht M, Buhrmester J, Chernikova TN, Denaro R, Ferrer M, Gertler C, Goesmann A, Golyshina OV, Kaminski F, Khachane AN, Lang S, Linke B, McHardy AC, Meyer F, Nechitaylo T, Pühler A, Regenhardt D, Rupp O, Sabirova JS, Selbitschka W, Yakimov MM, Timmis KN, Vorhölter F-J, Weidner S, Kaiser O, Golyshin PN (2006) Genome sequence of the ubiquitous hydrocarbon-degrading marine bacterium Alcanivorax borkumensis. Nat Biotechnol 24:997–1004. doi:10.1038/nbt1232

    Article  CAS  PubMed  Google Scholar 

  2. Frey KG, Herrera-Galeano JE, Redden CL, Luu TV, Servetas SL, Mateczun AJ, Mokashi VP, Bishop-Lilly KA (2014) Comparison of three next-generation sequencing platforms for metagenomic sequencing and identification of pathogens in blood. BMC Genomics 15:96. doi:10.1186/1471-2164-15-96

    Article  PubMed  PubMed Central  Google Scholar 

  3. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ (2012) Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 30:434–439. doi:10.1038/nbt.2198

    Article  CAS  PubMed  Google Scholar 

  4. Raes J, Korbel JO, Lercher MJ, von Mering C, Bork P (2007) Prediction of effective genome size inmetagenomic samples. Genome Biol 8:R10. doi:10.1186/gb-2007-8-1-r10

    Article  PubMed  PubMed Central  Google Scholar 

  5. Tamames J, de la Peña S, de Lorenzo V (2012) COVER: a priori estimation of coverage for metagenomic sequencing. Environ Microbiol Rep 4:335–341. doi:10.1111/j.1758-2229.2012.00338.x

    Article  CAS  PubMed  Google Scholar 

  6. Daley T, Smith AD (2013) Predicting the molecular complexity of sequencing libraries. Nat Methods 10:325–327. doi:10.1038/nmeth.2375

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Schmieder R, Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863–864. doi:10.1093/bioinformatics/btr026

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Schmieder R, Edwards R (2011) Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One 6:e17288. doi:10.1371/journal.pone.0017288

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Kelley DR, Schatz MC, Salzberg SL (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol 11:R116. doi:10.1186/gb-2010-11-11-r116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Salmela L, Schröder J (2011) Correcting errors in short reads by multiple alignments. Bioinformatics 27:1455–1461. doi:10.1093/bioinformatics/btr170

    Article  CAS  PubMed  Google Scholar 

  11. Chevreux B, Wetter T, Suhai S (1999) Genome sequence assembly using trace signals and additional sequence information In: Computer science and biology: Proceedings of the German conference on bioinformatics. http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html

  12. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380. doi:10.1038/nature03959

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi:10.1101/gr.074492.107

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578–579. doi:10.1093/bioinformatics/btq683

    Article  CAS  PubMed  Google Scholar 

  15. Nadalin F, Vezzi F, Policriti A (2012) GapFiller: a de novo assembly approach to fill the gap within paired reads. BMC Bioinformatics 13(Suppl 1):S8. doi:10.1186/1471-2105-13-S14-S8

    Article  PubMed  PubMed Central  Google Scholar 

  16. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589–595. doi:10.1093/bioinformatics/btp698

    Article  PubMed  PubMed Central  Google Scholar 

  17. Yang X, Chockalingam SP, Aluru S (2013) A survey of error-correction methods for next-generation sequencing. Brief Bioinform 14:56–66. doi:10.1093/bib/bbs015

    Article  CAS  PubMed  Google Scholar 

  18. Nagarajan N, Pop M (2013) Sequence assembly demystified. Nat Rev Genet 14:157–167. doi:10.1038/nrg3367

    Article  CAS  PubMed  Google Scholar 

  19. El-Metwally S, Hamza T, Zakaria M, Helmy M (2013) Next-generation sequence assembly: four stages of data processing and computational challenges. PLoS Comput Biol 9:e1003345. doi:10.1371/journal.pcbi.1003345

    Article  PubMed  PubMed Central  Google Scholar 

  20. Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202

    Article  CAS  PubMed  Google Scholar 

  21. Schatz MC, Phillippy AM, Shneiderman B, Salzberg SL (2007) Hawkeye: an interactive visual analytics tool for genome assemblies. Genome Biol 8:R34. doi:10.1186/gb-2007-8-3-r34

    Article  PubMed  PubMed Central  Google Scholar 

  22. Narzisi G, Mishra B (2011) Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons. Bioinformatics 27:153–160. doi:10.1093/bioinformatics/btq646

    Article  CAS  PubMed  Google Scholar 

  23. Vezzi F, Narzisi G, Mishra B (2012) Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons. PLoS One 7:e52210. doi:10.1371/journal.pone.0052210

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Sernova NV, Gelfand MS (2008) Identification of replication origins in prokaryotic genomes. Brief Bioinform 9:376–391. doi:10.1093/bib/bbn031

    Article  CAS  PubMed  Google Scholar 

  25. Zhang R, Zhang C-T (2005) Identification of replication origins in archaeal genomes based on the Z-curve method. Archaea 1:335–346

    Article  CAS  PubMed  Google Scholar 

  26. Luo H, Zhang C-T, Gao F (2014) Ori-Finder 2, an integrated tool to predict replication origins in the archaeal genomes. Front Microbiol 5:482. doi:10.3389/fmicb.2014.00482

    PubMed  PubMed Central  Google Scholar 

  27. Lagesen K, Hallin P, Rødland EA, Staerfeldt H-H, Rognes T, Ussery DW (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108. doi:10.1093/nar/gkm160

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Cros M-J, de Monte A, Mariette J, Bardou P, Grenier-Boley B, Gautheret D, Touzet H, Gaspin C (2011) RNAspace.org: an integrated environment for the prediction, annotation, and analysis of ncRNA. RNA 17:1947–1956. doi:10.1261/rna.2844911

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki EP, Eddy SR, Gardner PP, Bateman A (2013) Rfam 11.0: 10 years of RNA families. Nucleic Acids Res 41:D226–D232. doi:10.1093/nar/gks1005

    Article  CAS  PubMed  Google Scholar 

  30. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improveddetection of transfer RNA genes in genomicsequence. Nucleic Acids Res 25:955–964

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Schattner P, Brooks AN, Lowe TM (2005) The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33:W686–W689. doi:10.1093/nar/gki366

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75. doi:10.1186/1471-2164-9-75

    Article  PubMed  PubMed Central  Google Scholar 

  33. Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386. doi:10.1186/1471-2105-9-386

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Novichkov PS, Laikova ON, Novichkova ES, Gelfand MS, Arkin AP, Dubchak I, Rodionov DA (2010) RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes. Nucleic Acids Res 38:D111–D118. doi:10.1093/nar/gkp894

    Article  CAS  PubMed  Google Scholar 

  35. Novichkov PS, Rodionov DA, Stavrovskaya ED, Novichkova ES, Kazakov AE, Gelfand MS, Arkin AP, Mironov AA, Dubchak I (2010) RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach. Nucleic Acids Res 38:W299–W307. doi:10.1093/nar/gkq531

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Cipriano MJ, Novichkov PN, Kazakov AE, Rodionov DA, Arkin AP, Gelfand MS, Dubchak I (2013) RegTransBase – a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes. BMC Genomics 14:213. doi:10.1186/1471-2164-14-213

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410. doi:10.1016/S0022-2836(05)80360-2

    Article  CAS  PubMed  Google Scholar 

  38. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Zhang Z, Schäffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF (1998) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res 26:3986–3990

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Boratyn GM, Schäffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7:12. doi:10.1186/1745-6150-7-12

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Saier MH, Reddy VS, Tamang DG, Västermark A (2014) The transporter classification database. Nucleic Acids Res 42:D251–D258. doi:10.1093/nar/gkt1097

    Article  CAS  PubMed  Google Scholar 

  42. Rawlings ND, Waller M, Barrett AJ, Bateman A (2014) MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res 42:D503–D509. doi:10.1093/nar/gkt953

    Article  CAS  PubMed  Google Scholar 

  43. Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y (2012) dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 40:W445–W451. doi:10.1093/nar/gks479

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Hall T (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98

    CAS  Google Scholar 

  45. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37. doi:10.1093/nar/gkr367

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics Chapter 2: Unit 2.3. doi:10.1002/0471250953.bi0203s00

  47. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. doi:10.1093/nar/gkh340

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217. doi:10.1006/jmbi.2000.4042

    Article  CAS  PubMed  Google Scholar 

  49. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780. doi:10.1093/molbev/mst010

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Wallace IM, O’Sullivan O, Higgins DG, Notredame C (2006) M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 34:1692–1699. doi:10.1093/nar/gkl091

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577. doi:10.1080/10635150701472164

    Article  CAS  PubMed  Google Scholar 

  52. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    CAS  PubMed  Google Scholar 

  53. Cavalli-Sforza LL, Edwards AW (1967) Phylogenetic analysis. Models and estimation procedures. Am J Hum Genet 19:233–257

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754–755

    Article  CAS  PubMed  Google Scholar 

  55. Felsenstein J (2012) Confidence limits on phylogenies: an approach using the bootstrap. Evolution (N Y) 39:783–791

    Google Scholar 

  56. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729. doi:10.1093/molbev/mst197

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Huang Y, Niu B, Gao Y, Fu L, Li W (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682. doi:10.1093/bioinformatics/btq003

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Petersen TN, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8:785–786. doi:10.1038/nmeth.1701

    Article  CAS  PubMed  Google Scholar 

  59. Möller S, Croning MD, Apweiler R (2001) Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17:646–653

    Article  PubMed  Google Scholar 

  60. Käll L, Krogh A, Sonnhammer ELL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036. doi:10.1016/j.jmb.2004.03.016

    Article  PubMed  Google Scholar 

  61. Bendtsen JD, Nielsen H, Widdick D, Palmer T, Brunak S (2005) Prediction of twin-arginine signal peptides. BMC Bioinformatics 6:167. doi:10.1186/1471-2105-6-167

    Article  PubMed  PubMed Central  Google Scholar 

  62. Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S (2004) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 17:349–356. doi:10.1093/protein/gzh037

    Article  CAS  PubMed  Google Scholar 

  63. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41:D808–D815. doi:10.1093/nar/gks1094

    Article  CAS  PubMed  Google Scholar 

  64. Caspi R, Altman T, Billington R, Dreher K, Foerster H, Fulcher CA, Holland TA, Keseler IM, Kothari A, Kubo A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Subhraveti P, Weaver DS, Weerasinghe D, Zhang P, Karp PD (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 42:D459–D471. doi:10.1093/nar/gkt1103

    Article  CAS  PubMed  Google Scholar 

  65. Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42:D199–D205. doi:10.1093/nar/gkt1076

    Article  CAS  PubMed  Google Scholar 

  66. Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Cassarino TG, Bertoni M, Bordoli L, Schwede T (2014) SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res 42:W252–W258. doi:10.1093/nar/gku340

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Kim DE, Chivian D, Baker D (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 32:W526–W531. doi:10.1093/nar/gkh468

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Biegert A, Mayer C, Remmert M, Söding J, Lupas AN (2006) The MPI Bioinformatics Toolkit for protein sequence analysis. Nucleic Acids Res 34:W335–W339. doi:10.1093/nar/gkl217

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Hildebrand A, Remmert M, Biegert A, Söding J (2009) Fast and accurate automatic structure prediction with HHpred. Proteins 77(Suppl 9):128–132. doi:10.1002/prot.22499

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This research was supported by the European Community Projects MAGICPAH (FP7-KBBE-2009-245226), ULIXES (FP7-KBBE-2010-266473) KILLSPILL (FP7-KBBE-2012-312139); MicroB3 (OCEAN.2011-2- 287589) and the Royal Society UK-Russia Travel Grant IE130218. The work of IK was supported by RFBR Grant # 13-04-0215715 and RSF Grant # 14-24-00165.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter N. Golyshin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this protocol

Cite this protocol

Toshchakov, S.V., Kublanov, I.V., Messina, E., Yakimov, M.M., Golyshin, P.N. (2015). Genomic Analysis of Pure Cultures and Communities. In: McGenity, T., Timmis, K., Nogales , B. (eds) Hydrocarbon and Lipid Microbiology Protocols. Springer Protocols Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/8623_2015_126

Download citation

  • DOI: https://doi.org/10.1007/8623_2015_126

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-50449-9

  • Online ISBN: 978-3-662-50450-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics