Skip to main content

Identification of Natural Product Biosynthetic Gene Clusters from Bacterial Genomic Data

  • Protocol
  • First Online:
Methods in Pharmacology and Toxicology

Abstract

The frequent re-isolation of known compounds is one of the main challenges of traditional screening methods for natural products drug discovery. The ability to connect natural products to the genes that encode them and vice versa has the potential to revolutionize discovery efforts. Increasingly sophisticated bioinformatic tools are being developed that are able to not only identify biosynthetic genes in sequenced genomes but can also predict the product class or structure in silico. This information can then guide targeted discovery of new compounds. In this chapter, we will describe how to prioritize bacterial strains for genome sequencing and how biosynthetic gene clusters can be identified in bacterial genomes. We will also give a short introduction on how comparative genomics can help to identify different congeners of a specific class of natural products of interest and what the limitations of structure prediction are. We will not attempt to be exhaustive but will rather provide examples that the reader can actively follow.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Li JW, Vederas JC (2009) Drug discovery and natural products: end of an era or an endless frontier? Science 325:161–165

    Google Scholar 

  2. Harvey AL, Edrada-Ebel R, Quinn RJ (2015) The re-emergence of natural products for drug discovery in the genomics era. Nat Rev Drug Discov 14:111–129

    Google Scholar 

  3. Kellenberger E, Hofmann A, Quinn RJ (2011) Similar interactions of natural products with biosynthetic enzymes and therapeutic targets could explain why nature produces such a large proportion of existing drugs. Nat Prod Rep 28:1483–1492

    Google Scholar 

  4. Cragg GM, Newman DJ (2013) Natural products: a continuing source of novel drug leads. Biochim Biophys Acta 1830:3670–3695

    Google Scholar 

  5. Gerwick WH, Moore BS (2012) Lessons from the past and charting the future of marine natural products drug discovery and chemical biology. Chem Biol 19:85–98

    Google Scholar 

  6. Bentley SD, Chater KF, Cerdeno-Tarraga AM et al (2002) Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 417:141–147

    Google Scholar 

  7. Ikeda H, Ishikawa J, Hanamoto A et al (2003) Complete genome sequence and comparative analysis of the industrial microorganism Streptomyces avermitilis. Nat Biotechnol 21:526–531

    Google Scholar 

  8. Udwary DW, Zeigler L, Asolkar RN et al (2007) Genome sequencing reveals complex secondary metabolome in the marine actinomycete Salinispora tropica. Proc Natl Acad Sci U S A 104:10376–10381

    Google Scholar 

  9. Bachmann BO, Van Lanen SG, Baltz RH (2014) Microbial genome mining for accelerated natural products discovery: is a renaissance in the making? J Ind Microbiol Biotechnol 41:175–184

    Google Scholar 

  10. Ikeda H, Kazuo SY, Omura S (2014) Genome mining of the Streptomyces avermitilis genome and development of genome-minimized hosts for heterologous expression of biosynthetic gene clusters. J Ind Microbiol Biotechnol 41:233–250

    Google Scholar 

  11. Gomez-Escribano JP, Bibb MJ (2014) Heterologous expression of natural product biosynthetic gene clusters in Streptomyces coelicolor: from genome mining to manipulation of biosynthetic pathways. J Ind Microbiol Biotechnol 41:425–431

    Google Scholar 

  12. Zhou Z, Xu Q, Bu Q et al (2015) Genome mining-directed activation of a silent angucycline biosynthetic gene cluster in Streptomyces chattanoogensis. Chembiochem 16:496–502

    Google Scholar 

  13. Challis GL (2014) Exploitation of the Streptomyces coelicolor A3(2) genome sequence for discovery of new natural products and biosynthetic pathways. J Ind Microbiol Biotechnol 41:219–232

    Google Scholar 

  14. Spohn M, Kirchner N, Kulik A et al (2014) Overproduction of Ristomycin A by activation of a silent gene cluster in Amycolatopsis japonicum MG417-CF17. Antimicrob Agents Chemother 58:6185–6196

    Google Scholar 

  15. Challis GL (2008) Genome mining for novel natural product discovery. J Med Chem 51:2618–2628

    Google Scholar 

  16. Ziemert N, Podell S, Penn K et al (2012) The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One 7:e34064

    Google Scholar 

  17. Eustaquio AS, Nam SJ, Penn K et al (2011) The discovery of salinosporamide K from the Marine Bacterium “Salinispora pacifica” by genome mining gives insight into pathway evolution. Chembiochem 12:61–64

    Google Scholar 

  18. Nutzmann HW, Osbourn A (2014) Gene clustering in plant specialized metabolism. Curr Opin Biotechnol 26:91–99

    Google Scholar 

  19. Hertweck C (2009) The biosynthetic logic of polyketide diversity. Angew Chem Int Ed Engl 48:4688–4716

    Google Scholar 

  20. Piel J (2010) Biosynthesis of polyketides by trans-AT polyketide synthases. Nat Prod Rep 27:996–1047

    Google Scholar 

  21. Condurso HL, Bruner SD (2012) Structure and noncanonical chemistry of nonribosomal peptide biosynthetic machinery. Nat Prod Rep 29:1099–1110

    Google Scholar 

  22. Hur GH, Vickery CR, Burkart MD (2012) Explorations of catalytic domains in non-ribosomal peptide synthetase enzymology. Nat Prod Rep 29:1074–1098

    Google Scholar 

  23. Dunbar KL, Mitchell DA (2013) Revealing nature’s synthetic potential through the study of ribosomal natural product biosynthesis. ACS Chem Biol 8:473–487

    Google Scholar 

  24. Arnison PG, Bibb MJ, Bierbaum G et al (2013) Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat Prod Rep 30:108–160

    Google Scholar 

  25. Letzel AC, Pidot SJ, Hertweck C (2014) Genome mining for ribosomally synthesized and post-translationally modified peptides (RiPPs) in anaerobic bacteria. BMC Genomics 15:983

    Google Scholar 

  26. Mccranie EK, Bachmann BO (2014) Bioactive oligosaccharide natural products. Nat Prod Rep 31:1026–1042

    Google Scholar 

  27. Flatt PM, Mahmud T (2007) Biosynthesis of aminocyclitol-aminoglycoside antibiotics and related compounds. Nat Prod Rep 24:358–392

    Google Scholar 

  28. Cane DE, Ikeda H (2012) Exploration and mining of the bacterial terpenome. Acc Chem Res 45:463–472

    Google Scholar 

  29. Christianson DW (2006) Structural biology and chemistry of the terpenoid cyclases. Chem Rev 106:3412–3442

    Google Scholar 

  30. Cimermancic P, Medema MH, Claesen J et al (2014) Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158:412–421

    Google Scholar 

  31. Anand S, Prasad MV, Yadav G et al (2010) SBSPKS: structure based sequence analysis of polyketide synthases. Nucleic Acids Res 38:W487–W496

    Google Scholar 

  32. Li MH, Ung PM, Zajkowski J et al (2009) Automated genome mining for natural products. BMC Bioinformatics 10:185

    Google Scholar 

  33. Van Heel AJ, De Jong A, Montalban-Lopez M et al (2013) BAGEL3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides. Nucleic Acids Res 41:W448–W453

    Google Scholar 

  34. Weber T (2014) In silico tools for the analysis of antibiotic biosynthetic pathways. Int J Med Microbiol 304:230–235

    Google Scholar 

  35. Weber T, Blin K, Duddela S et al (2015) antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res 43(W1):W237–W243

    Google Scholar 

  36. Jensen PR, Moore BS, Fenical W (2015) The marine actinomycete genus Salinispora: a model organism for secondary metabolite discovery. Nat Prod Rep 32:738–751

    Google Scholar 

  37. Gontang EA, Gaudencio SP, Fenical W et al (2010) Sequence-based analysis of secondary-metabolite biosynthesis in marine actinobacteria. Appl Environ Microbiol 76:2487–2499

    Google Scholar 

  38. Edlund A, Loesgen S, Fenical W et al (2011) Geographic distribution of secondary metabolite genes in the marine actinomycete Salinispora arenicola. Appl Environ Microbiol 77:5916–5925

    Google Scholar 

  39. Charlop-Powers Z, Owen JG, Reddy BV et al (2014) Chemical-biogeographic survey of secondary metabolism in soil. Proc Natl Acad Sci U S A 111:3757–3762

    Google Scholar 

  40. Moffitt MC, Neilan BA (2003) Evolutionary affiliations within the superfamily of ketosynthases reflect complex pathway associations. J Mol Evol 56:446–457

    Google Scholar 

  41. Morlon H, O'connor TK, Bryant JA et al (2015) The biogeography of putative microbial antibiotic production. PLoS One 10:e0130659

    Google Scholar 

  42. Muller CA, Oberauner-Wappis L, Peyman A et al (2015) Mining for nonribosomal peptide synthetase and polyketide synthase genes revealed a high level of diversity in the sphagnum bog metagenome. Appl Environ Microbiol 81:5064–5072

    Google Scholar 

  43. Donia MS, Fricke WF, Ravel J et al (2011) Variation in tropical reef symbiont metagenomes defined by secondary metabolism. PLoS One 6:e17897

    Google Scholar 

  44. Leikoski N, Fewer DP, Sivonen K (2009) Widespread occurrence and lateral transfer of the cyanobactin biosynthesis gene cluster in cyanobacteria. Appl Environ Microbiol 75:853–857

    Google Scholar 

  45. Ziemert N, Ishida K, Weiz A et al (2010) Exploiting the natural diversity of microviridin gene clusters for discovery of novel tricyclic depsipeptides. Appl Environ Microbiol 76:3568–3574

    Google Scholar 

  46. Chang FY, Ternei MA, Calle PY et al (2013) Discovery and synthetic refactoring of tryptophan dimer gene clusters from the environment. J Am Chem Soc 135:17906–17912

    Google Scholar 

  47. Owen JG, Charlop-Powers Z, Smith AG et al (2015) Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors. Proc Natl Acad Sci U S A 112:4221–4226

    Google Scholar 

  48. Quince C, Lanzen A, Davenport RJ et al (2011) Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12:38

    Google Scholar 

  49. Kuczynski J, Stombaugh J, Walters WA et al (2011) Using QIIME to analyze 16S rRNA gene sequences from microbial communities. Current protocols in bioinformatics/editoral board, Andreas D. Baxevanis ... [et al.] Chapter 10:Unit 10 17

    Google Scholar 

  50. Caporaso JG, Kuczynski J, Stombaugh J et al (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335–336

    Google Scholar 

  51. Schloss PD, Westcott SL, Ryabin T et al (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537–7541

    Google Scholar 

  52. Gaspar JM, Thomas WK (2013) Assessing the consequences of denoising marker-based metagenomic data. PLoS One 8:e60458

    Google Scholar 

  53. Woodhouse JN, Fan L, Brown MV et al (2013) Deep sequencing of non-ribosomal peptide synthetases and polyketide synthases from the microbiomes of Australian marine sponges. ISME J 7:1842–1851

    Google Scholar 

  54. Ichikawa N, Sasagawa M, Yamamoto M et al (2013) DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters. Nucleic Acids Res 41:D408–D414

    Google Scholar 

  55. Conway KR, Boddy CN (2013) ClusterMine360: a database of microbial PKS/NRPS biosynthesis. Nucleic Acids Res 41:D402–D407

    Google Scholar 

  56. Field D, Garrity G, Gray T et al (2008) The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26:541–547

    Google Scholar 

  57. Medema MH, Kottmann R, Yilmaz P et al (2015) Minimum information about a biosynthetic gene cluster. Nat Chem Biol 11:625–631

    Google Scholar 

  58. Ziemert N, Jensen PR (2012) Phylogenetic approaches to natural product structure prediction. Methods Enzymol 517:161–182

    Google Scholar 

  59. Schmitt I, Barker FK (2009) Phylogenetic methods in natural product research. Nat Prod Rep 26:1585–1602

    Google Scholar 

  60. Reddy BV, Milshteyn A, Charlop-Powers Z et al (2014) eSNaPD: a versatile, web-based bioinformatics platform for surveying and mining natural product biosynthetic diversity from metagenomes. Chem Biol 21:1023–1033

    Google Scholar 

  61. Ziemert N, Lechner A, Wietz M et al (2014) Diversity and evolution of secondary metabolism in the marine actinomycete genus Salinispora. Proc Natl Acad Sci U S A 111:E1130–E1139

    Google Scholar 

  62. Duncan KR, Crusemann M, Lechner A et al (2015) Molecular networking and pattern-based genome mining improves discovery of biosynthetic gene clusters and their products from Salinispora species. Chem Biol 22:460–471

    Google Scholar 

  63. Calteau A, Fewer DP, Latifi A et al (2014) Phylum-wide comparative genomics unravel the diversity of secondary metabolism in Cyanobacteria. BMC Genomics 15:977

    Google Scholar 

  64. Doroghazi JR, Albright JC, Goering AW et al (2014) A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat Chem Biol 10:963–968

    Google Scholar 

  65. Medema MH, Takano E, Breitling R (2013) Detecting sequence homology at the gene cluster level with MultiGeneBlast. Mol Biol Evol 30:1218–1223

    Google Scholar 

  66. Deane CD, Mitchell DA (2014) Lessons learned from the transformation of natural product discovery to a genome-driven endeavor. J Ind Microbiol Biotechnol 41:315–331

    Google Scholar 

  67. Wyatt MA, Wang W, Roux CM et al (2010) Staphylococcus aureus nonribosomal peptide secondary metabolites regulate virulence. Science 329:294–296

    Google Scholar 

  68. Keatinge-Clay AT (2012) The structures of type I polyketide synthases. Nat Prod Rep 29:1050–1073

    Google Scholar 

  69. Lautru S, Challis GL (2004) Substrate recognition by nonribosomal peptide synthetase multi-enzymes. Microbiology 150:1629–1636

    Google Scholar 

  70. Rottig M, Medema MH, Blin K et al (2011) NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res 39:W362–W367

    Google Scholar 

  71. Bachmann BO, Ravel J (2009) Chapter 8. Methods for in silico prediction of microbial polyketide and nonribosomal peptide biosynthetic pathways from DNA sequence data. Methods Enzymol 458:181–217

    Google Scholar 

  72. Prieto C, Garcia-Estrada C, Lorenzana D et al (2012) NRPSsp: non-ribosomal peptide synthase substrate predictor. Bioinformatics 28:426–427

    Google Scholar 

  73. Khayatt BI, Overmars L, Siezen RJ et al (2013) Classification of the adenylation and acyl-transferase activity of NRPS and PKS systems using ensembles of substrate specific hidden Markov models. PLoS One 8:e62136

    Google Scholar 

  74. Nguyen T, Ishida K, Jenke-Kodama H et al (2008) Exploiting the mosaic structure of trans-acyltransferase polyketide synthases for natural product discovery and pathway dissection. Nat Biotechnol 26:225–233

    Google Scholar 

  75. Fischbach MA, Walsh CT (2006) Assembly-line enzymology for polyketide and nonribosomal peptide antibiotics: logic, machinery, and mechanisms. Chem Rev 106:3468–3496

    Google Scholar 

  76. Lautru S, Deeth RJ, Bailey LM et al (2005) Discovery of a new peptide natural product by Streptomyces coelicolor genome mining. Nat Chem Biol 1:265–269

    Google Scholar 

  77. Velasquez JE, Van Der Donk WA (2011) Genome mining for ribosomally synthesized natural products. Curr Opin Chem Biol 15:11–21

    Google Scholar 

  78. Mohimani H, Kersten RD, Liu WT et al (2014) Automated genome mining of ribosomal peptide natural products. ACS Chem Biol 9:1545–1551

    Google Scholar 

  79. Green MR, Sambrook J (2012) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, New York

    Google Scholar 

  80. Medema MH, Fischbach MA (2015) Computational approaches to natural product discovery. Nat Chem Biol 11:639–648

    Google Scholar 

  81. O’brien J, Wright GD (2011) An ecological perspective of microbial secondary metabolism. Curr Opin Biotechnol 22:552–558

    Google Scholar 

  82. Mcarthur AG, Waglechner N, Nizam F et al (2013) The comprehensive antibiotic resistance database. Antimicrob Agents Chemother 57:3348–3357

    Google Scholar 

  83. Thaker MN, Waglechner N, Wright GD (2014) Antibiotic resistance-mediated isolation of scaffold-specific natural product producers. Nat Protoc 9:1469–1479

    Google Scholar 

  84. Ginolhac A, Jarrin C, Gillet B et al (2004) Phylogenetic analysis of polyketide synthase I domains from soil metagenomic libraries allows selection of promising clones. Appl Environ Microbiol 70:5522–5527

    Google Scholar 

  85. Metsa-Ketela M, Salo V, Halo L et al (1999) An efficient approach for screening minimal PKS genes from Streptomyces. FEMS Microbiol Lett 180:1–6

    Google Scholar 

  86. Ayuso-Sacido A, Genilloud O (2005) New PCR primers for the screening of NRPS and PKS-I systems in actinomycetes: detection and distribution of these biosynthetic gene sequences in major taxonomic groups. Microb Ecol 49:10–24

    Google Scholar 

  87. Chang FY, Ternei MA, Calle PY et al (2015) Targeted metagenomics: finding rare tryptophan dimer natural products in the environment. J Am Chem Soc 137:6044–6052

    Google Scholar 

Download references

Acknowledgments

The authors acknowledge the Department of Medicinal Chemistry and Pharmacognosy of the University of Illinois at Chicago and the Microbiology/Biotechnology Interfaculty Institute of Microbiology and Infection Medicine of the University of Tübingen for start-up funds during the course of writing this chapter.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Alessandra S. Eustáquio or Nadine Ziemert .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Eustáquio, A.S., Ziemert, N. (2018). Identification of Natural Product Biosynthetic Gene Clusters from Bacterial Genomic Data. In: Methods in Pharmacology and Toxicology. Humana Press. https://doi.org/10.1007/7653_2018_32

Download citation

  • DOI: https://doi.org/10.1007/7653_2018_32

  • Published:

  • Publisher Name: Humana Press

Publish with us

Policies and ethics