KeywordsGene Cluster Biosynthetic Gene Cluster Profile Hide Markov Model Nonribosomal Peptide Synthetase Cluster Alignment
antiSMASH (Medema et al. 2011) is a web server and a stand-alone software to identify, annotate, and compare gene clusters that encode the biosynthesis of secondary metabolites in bacterial and fungal genomes. antiSMASH offers a wide range of options to identify and analyze biosynthetic gene clusters, including protein domain analysis of the large multi-domain enzymatic assembly lines involved, prediction of core chemical structures of their end compounds, and multiple cluster alignments to a database of all currently sequenced gene clusters.
The antiSMASH web server can be found at http://antismash.secondarymetabolites.org.
Microbial secondary metabolites are of great interest to society because of their diverse biological activities that are interesting starting points for drug development. Many of them are already used as antibiotics, antitumor agents, or cholesterol-lowering drugs (Hutchinson and McDaniel 2001; Fischbach and Walsh 2009). Automated computational identification of gene clusters in newly sequenced genomes is becoming a cornerstone of genome-based drug discovery, due to the affordability of sequencing large numbers of genomes from microorganisms that potentially produce novel secondary metabolites (Walsh and Fischbach 2010).
Gene Cluster Detection
antiSMASH detects a wide range of different types of biosynthetic gene clusters, including those encoding the pathways toward polyketides (PKs), nonribosomal peptides (NRPs), terpenoids, ribosomal peptides, aminoglycosides, and non-NRP siderophores. The detection is performed by screening the gene sequences from the input against a library of profile Hidden Markov Models (pHMMs) (Eddy 2011), each of which is specific for genes characteristic for a certain gene cluster type, and passing the results through a hierarchical logic filter. A second detection algorithm is also run, which detects genomic regions that are enriched in Pfam domains (Finn et al. 2010) linked to secondary metabolism.
Protein Domain Analysis of Polyketide Synthases and Nonribosomal Peptide Synthetases
Core Chemical Structure Prediction
Comparative Analysis of Gene Clusters
Secondary Metabolism-Specific Gene Family Analysis
Most genes involved in the biosynthesis of secondary metabolite have (close) homologues with similar functions in other secondary metabolite biosynthesis gene clusters. This can be used to infer the functions of the genes residing in the biosynthetic gene cluster based on sequence homology. antiSMASH simplifies this process by categorizing the genes of every identified gene cluster into secondary metabolism-specific gene families and automatically generating approximate phylogenetic trees of each gene in the context of its gene family.
Genome-Wide Pfam and Blast Analysis
Finally, antiSMASH also offers the possibility (transferred from CLUSEAN; Weber et al. 2009) to do a comprehensive analysis of all genes within a submitted genome, identifying Pfam matches and running Blast for each gene against a database of all bacterial and fungal protein sequences.
Stand-alone versions of antiSMASH are available for download for Windows, Mac OS X, and Ubuntu Linux. Additionally, several related scripts are available from the antiSMASH website. An EMBL formatting script can be downloaded to format raw FASTA sequences together with a text file containing gene annotations into an EMBL file that can be submitted to antiSMASH. Also, a script is available which allows running antiSMASH on multiple files, in batch mode.
antiSMASH is still under active development. Some features projected for the next release are batch input on the web server, protein sequence input, and subclass prediction for enzyme classes like terpene synthases and trans-AT PKSs. Feature requests, bug reports, or other questions/suggestions can be sent to the development team via the online contact form on the antiSMASH website.
Several other software tools for the study of secondary metabolism have been published. For example, ClustScan (Starcevic et al. 2008) and NP.searcher (Li et al. 2009) can both be used to detect bacterial polyketide and NRP biosynthesis gene clusters. The same is the case for CLUSEAN (Weber et al. 2009), the pipeline which has now been integrated entirely into antiSMASH. For the analysis of fungal sequences, SMURF (Khaldi et al. 2010) offers a gene cluster detection potential similar to that of antiSMASH. Structural analysis of polyketide synthases can be performed with the SBSPKS suite (Anand et al. 2010). Finally, draft genomes with many small contigs and metagenomes with fragments too small for gene cluster detection can be scrutinized with NaPDoS (Ziemert et al. 2012) in order to find protein domains related to secondary metabolite biosynthesis and analyze these phylogenetically.
antiSMASH is an easy-to-use web server for the detection of secondary metabolite biosynthesis gene clusters. Various functionalities – comparative, phylogenomic, enzymatic, etc. – are integrated in one single pipeline, making it straightforward for genomicists and natural product researchers to study the biosynthetic potential of any organism.
- Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011;39:W339–46.PubMedCentralPubMedCrossRefGoogle Scholar
- Starcevic A, Zucko J, Simunkovic J, Long PF, Cullum J, Hranueli D. ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures. Nucleic Acids Res. 2008;36:6882–92.PubMedCentralPubMedCrossRefGoogle Scholar