Encyclopedia of Metagenomics

Living Edition
| Editors: Karen E. Nelson


  • Eriko Takano
  • Rainer Breitling
  • Marnix H. MedemaEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-6418-1_703-4


Gene Cluster Biosynthetic Gene Cluster Profile Hide Markov Model Nonribosomal Peptide Synthetase Cluster Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


antiSMASH (Medema et al. 2011) is a web server and a stand-alone software to identify, annotate, and compare gene clusters that encode the biosynthesis of secondary metabolites in bacterial and fungal genomes. antiSMASH offers a wide range of options to identify and analyze biosynthetic gene clusters, including protein domain analysis of the large multi-domain enzymatic assembly lines involved, prediction of core chemical structures of their end compounds, and multiple cluster alignments to a database of all currently sequenced gene clusters.

The antiSMASH web server can be found at http://antismash.secondarymetabolites.org.


Microbial secondary metabolites are of great interest to society because of their diverse biological activities that are interesting starting points for drug development. Many of them are already used as antibiotics, antitumor agents, or cholesterol-lowering drugs (Hutchinson and McDaniel 2001; Fischbach and Walsh 2009). Automated computational identification of gene clusters in newly sequenced genomes is becoming a cornerstone of genome-based drug discovery, due to the affordability of sequencing large numbers of genomes from microorganisms that potentially produce novel secondary metabolites (Walsh and Fischbach 2010).


Gene Cluster Detection

antiSMASH detects a wide range of different types of biosynthetic gene clusters, including those encoding the pathways toward polyketides (PKs), nonribosomal peptides (NRPs), terpenoids, ribosomal peptides, aminoglycosides, and non-NRP siderophores. The detection is performed by screening the gene sequences from the input against a library of profile Hidden Markov Models (pHMMs) (Eddy 2011), each of which is specific for genes characteristic for a certain gene cluster type, and passing the results through a hierarchical logic filter. A second detection algorithm is also run, which detects genomic regions that are enriched in Pfam domains (Finn et al. 2010) linked to secondary metabolism.

Protein Domain Analysis of Polyketide Synthases and Nonribosomal Peptide Synthetases

PKs and NRPs are synthesized by large megasynthase enzymes containing a multitude of protein domains, such as condensation (C) and adenylation (A) and PCP-binding domains in nonribosomal peptide synthetases (NRPSs), ketosynthase (KS), and acyltransferase (AT) and ACP-binding domains in polyketide synthases (PKSs) (Fischbach and Walsh 2006). antiSMASH contains a library of pHMMs that can recognize all these protein domains as well as distinguish between various subtypes of these domains. In the antiSMASH output, the domain structures of any NRPSs or PKSs encoded in a gene cluster are visualized, and several downstream analysis options are provided for each domain (Fig. 1).
Fig. 1

Domain structure of multi-domain enzymes such as PKSs and NRPSs as visualized by antiSMASH, offering several options for analysis when the mouse is positioned over a domain: one can, for example, run a BlastP search specifically with the sequence of this domain

Core Chemical Structure Prediction

When a secondary metabolite biosynthesis gene cluster is detected, one of the key questions of course is what kind of chemical structure it produces. For NRPs and PKs, antiSMASH is able to already give a first approximation of the core chemical structure of the end compound (Fig. 2). To do so, it uses several substrate specificity prediction methods (Yadav et al. 2003; Minowa et al. 2007; Röttig et al. 2011) that are based on the amino acid sequences of the A domains of NRPSs and the AT domains of PKSs. To infer the sequential arrangement of the predicted substrates of the A/AT domains in the resulting polyketide or peptide, the order of the PKS enzymes in a multimodular assembly line is predicted using their estimated docking domain binding affinities (Yadav et al. 2009) or, alternatively, colinearity of the PKS or NRPS genes with their enzymes is assumed.
Fig. 2

Prediction of the core chemical structure of an NRP by antiSMASH. The residues are based on a consensus between three prediction methods for the substrate specificities of the NRPS adenylation domains in the gene cluster

Comparative Analysis of Gene Clusters

In order to understand the architecture and function of a secondary metabolite biosynthesis gene cluster, much is gained by examining it within its evolutionary context through the comparison with related gene clusters from species across the tree of life. To facilitate this, antiSMASH hosts a regularly updated database of gene clusters it has detected in all nucleotide sequences present in GenBank. antiSMASH then combines multiple BlastP runs into a comparative search of every identified gene cluster against all other known gene clusters. This is used to generate a multiple gene cluster alignment (Fig. 3), which can aid the biologist in assessment of the novelty of the gene cluster, detecting the borders of the gene cluster and identifying the conserved multigene modules that constitute its building blocks.
Fig. 3

Example of a multiple gene cluster alignment by antiSMASH, showing identified homologue clusters of the query gene cluster

Secondary Metabolism-Specific Gene Family Analysis

Most genes involved in the biosynthesis of secondary metabolite have (close) homologues with similar functions in other secondary metabolite biosynthesis gene clusters. This can be used to infer the functions of the genes residing in the biosynthetic gene cluster based on sequence homology. antiSMASH simplifies this process by categorizing the genes of every identified gene cluster into secondary metabolism-specific gene families and automatically generating approximate phylogenetic trees of each gene in the context of its gene family.

Genome-Wide Pfam and Blast Analysis

Finally, antiSMASH also offers the possibility (transferred from CLUSEAN; Weber et al. 2009) to do a comprehensive analysis of all genes within a submitted genome, identifying Pfam matches and running Blast for each gene against a database of all bacterial and fungal protein sequences.

Stand-Alone Version

Stand-alone versions of antiSMASH are available for download for Windows, Mac OS X, and Ubuntu Linux. Additionally, several related scripts are available from the antiSMASH website. An EMBL formatting script can be downloaded to format raw FASTA sequences together with a text file containing gene annotations into an EMBL file that can be submitted to antiSMASH. Also, a script is available which allows running antiSMASH on multiple files, in batch mode.


antiSMASH is still under active development. Some features projected for the next release are batch input on the web server, protein sequence input, and subclass prediction for enzyme classes like terpene synthases and trans-AT PKSs. Feature requests, bug reports, or other questions/suggestions can be sent to the development team via the online contact form on the antiSMASH website.

Related Tools

Several other software tools for the study of secondary metabolism have been published. For example, ClustScan (Starcevic et al. 2008) and NP.searcher (Li et al. 2009) can both be used to detect bacterial polyketide and NRP biosynthesis gene clusters. The same is the case for CLUSEAN (Weber et al. 2009), the pipeline which has now been integrated entirely into antiSMASH. For the analysis of fungal sequences, SMURF (Khaldi et al. 2010) offers a gene cluster detection potential similar to that of antiSMASH. Structural analysis of polyketide synthases can be performed with the SBSPKS suite (Anand et al. 2010). Finally, draft genomes with many small contigs and metagenomes with fragments too small for gene cluster detection can be scrutinized with NaPDoS (Ziemert et al. 2012) in order to find protein domains related to secondary metabolite biosynthesis and analyze these phylogenetically.


antiSMASH is an easy-to-use web server for the detection of secondary metabolite biosynthesis gene clusters. Various functionalities – comparative, phylogenomic, enzymatic, etc. – are integrated in one single pipeline, making it straightforward for genomicists and natural product researchers to study the biosynthetic potential of any organism.



  1. Anand S, Prasad MV, Yadav G, Kumar N, Shehara J, Ansari MZ, Mohanty D. SBSPKS: structure based sequence analysis of polyketide synthases. Nucleic Acids Res. 2010;38:W487–96.PubMedCentralPubMedCrossRefGoogle Scholar
  2. Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol. 2011;7:e1002195.PubMedCentralPubMedCrossRefGoogle Scholar
  3. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–22.PubMedCentralPubMedCrossRefGoogle Scholar
  4. Fischbach MA, Walsh CT. Assembly-line enzymology for polyketide and nonribosomal peptide antibiotics: logic, machinery, and mechanisms. Chem Rev. 2006;106:3468–96.PubMedCrossRefGoogle Scholar
  5. Fischbach MA, Walsh CT. Antibiotics for emerging pathogens. Science. 2009;325:1089–93.PubMedCentralPubMedCrossRefGoogle Scholar
  6. Hutchinson CR, McDaniel R. Combinatorial biosynthesis in microorganisms as a route to new antimicrobial, antitumor and neuroregenerative drugs. Curr Opin Investig Drugs. 2001;2:1681–90.PubMedGoogle Scholar
  7. Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, Wolfe KH, Fedorova ND. SMURF: genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol. 2010;47:736–41.PubMedCentralPubMedCrossRefGoogle Scholar
  8. Li MH, Ung PM, Zajkowski J, Garneau-Tsodikova S, Sherman DH. Automated genome mining for natural products. BMC Bioinformatics. 2009;10:185.PubMedCentralPubMedCrossRefGoogle Scholar
  9. Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011;39:W339–46.PubMedCentralPubMedCrossRefGoogle Scholar
  10. Minowa Y, Araki M, Kanehisa M. Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. J Mol Biol. 2007;368:1500–17.PubMedCrossRefGoogle Scholar
  11. Röttig M, Medema MH, Blin K, Weber T, Rausch C, Kohlbacher O. NRPSpredictor2 – a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res. 2011;39:W362–7.PubMedCentralPubMedCrossRefGoogle Scholar
  12. Starcevic A, Zucko J, Simunkovic J, Long PF, Cullum J, Hranueli D. ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures. Nucleic Acids Res. 2008;36:6882–92.PubMedCentralPubMedCrossRefGoogle Scholar
  13. Walsh CT, Fischbach MA. Natural products version 2.0: connecting genes to molecules. J Am Chem Soc. 2010;132:2469–93.PubMedCentralPubMedCrossRefGoogle Scholar
  14. Weber T, Rausch C, Lopez P, Hoof I, Gaykova V, Huson DH, Wohlleben W. CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. J Biotechnol. 2009;140:13–7.PubMedCrossRefGoogle Scholar
  15. Yadav G, Gokhale RS, Mohanty D. Computational approach for prediction of domain organization and substrate specificity of modular polyketide synthases. J Mol Biol. 2003;328:335–63.PubMedCrossRefGoogle Scholar
  16. Yadav G, Gokhale RS, Mohanty D. Towards prediction of metabolic products of polyketide synthases: an in silico analysis. PLoS Comput Biol. 2009;5:e1000351.PubMedCentralPubMedCrossRefGoogle Scholar
  17. Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR. The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One. 2012;7:e34064.PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Eriko Takano
    • 1
  • Rainer Breitling
    • 1
  • Marnix H. Medema
    • 2
    Email author
  1. 1.Manchester Institute of BiotechnologyUniversity of ManchesterManchesterUK
  2. 2.Microbial Genomics and Bioinformatics Research GroupMax Planck Institute for Marine MicrobiologyBremenGermany