Identification of Protein Secretion Systems in Bacterial Genomes Using MacSyFinder
Protein secretion systems are complex molecular machineries that translocate proteins through the outer membrane, and sometimes through multiple other barriers. They have evolved by co-option of components from other envelope-associated cellular machineries, making them sometimes difficult to identify and discriminate. Here, we describe how to identify protein secretion systems in bacterial genomes using MacSyFinder. This flexible computational tool uses the knowledge stemming from experimental studies to identify homologous systems in genome data. It can be used with a set of predefined models—“TXSScan”—to identify all major secretion systems of diderm bacteria (i.e., with inner and with LPS-containing outer membranes). For this, it identifies and clusters colocalized components of secretion systems using sequence similarity searches with hidden Markov model protein profiles. Finally, it checks whether the genetic content and organization of clusters satisfy the constraints of the model. TXSScan models can be customized to search for variants of known systems. The models can also be built from scratch to identify novel systems. In this chapter, we describe a complete pipeline of analysis, including the identification of a reference set of experimentally studied systems, the identification of components and the construction of their protein profiles, the definition of the models, their optimization, and, finally, their use as tools to search genomic data.
Key wordsComparative genomics Genome annotation Bioinformatics detection Macromolecular systems Bioinformatic modeling
- 27.Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A, Scheremetjew M, Rato C, Yong SY, Bateman A, Punta M, Attwood TK, Sigrist CJ, Redaschi N, Rivoire C, Xenarios I, Kahn D, Guyot D, Bork P, Letunic I, Gough J, Oates M, Haft D, Huang H, Natale DA, Wu CH, Orengo C, Sillitoe I, Mi H, Thomas PD, Finn RD (2015) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res 43(Database issue):D213–D221CrossRefPubMedGoogle Scholar