Applied Microbiology and Biotechnology

, Volume 103, Issue 7, pp 3123–3134 | Cite as

GeM-Pro: a tool for genome functional mining and microbial profiling

  • Mariano A. Torres Manno
  • María D. Pizarro
  • Marcos Prunello
  • Christian Magni
  • Lucas D. DaurelioEmail author
  • Martín EsparizEmail author
Genomics, transcriptomics, proteomics


Gem-Pro is a new tool for gene mining and functional profiling of bacteria. It initially identifies homologous genes using BLAST and then applies three filtering steps to select orthologous gene pairs. The first one uses BLAST score values to identify trivial paralogs. The second filter uses the shared identity percentages of found trivial paralogs as internal witnesses of non-orthology to set orthology cutoff values. The third filtering step uses conditional probabilities of orthology and non-orthology to define new cutoffs and generate supportive information of orthology assignations. Additionally, a subsidiary tool, called q-GeM, was also developed to mine traits of interest using logistic regression (LR) or linear discriminant analysis (LDA) classifiers. q-GeM is more efficient in the use of computing resources than Gem-Pro but needs an initial classified set of homologous genes in order to train LR and LDA classifiers. Hence, q-GeM could be used to analyze new set of strains with available genome sequences, without the need to rerun a complete Gem-Pro analysis. Finally, Gem-Pro and q-GeM perform a synteny analysis to evaluate the integrity and genomic arrangement of specific pathways of interest to infer their presence. The tools were applied to more than 2 million homologous pairs encoded by Bacillus strains generating statistical supported predictions of trait contents. The different patterns of encoded traits of interest were successfully used to perform a descriptive bacterial profiling.


Bacterial profiling Gene mining Orthology Phylogenomic analysis Plant growth–promoting rhizobacteria Bacillus 



MATM and MDP are CONICET fellows; CM, LD, and ME are researchers of the same institution. MP is professor at UNR.

Funding information

This study was funded by Agencia Nacional de Promoción Científica y Tecnológica (PICT 2014-1513 to CM, PICT 2015-2361 to ME, PICT-2016-0426 to LD).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

253_2019_9648_MOESM1_ESM.xls (344 kb)
ESM 1 (XLS 344 kb)
253_2019_9648_MOESM2_ESM.pdf (821 kb)
ESM 2 (PDF 820 kb)


  1. Aleti G, Sessitsch A, Brader G (2015) Genome mining: prediction of lipopeptides and polyketides from Bacillus and related Firmicutes. Comput Struct Biotechnol J 13:192–203. CrossRefPubMedPubMedCentralGoogle Scholar
  2. Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262. CrossRefPubMedPubMedCentralGoogle Scholar
  3. Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. Methods Mol Biol 855:259–279. CrossRefPubMedGoogle Scholar
  4. Belbahri L, Bouket AC, Rekik I, Alenezi FN, Vallat A, Luptakova L, Petrovova E, Oszako T, Cherrad S, Vacher S, Rateb ME (2017) Comparative genomics of Bacillus amyloliquefaciens strains reveals a core genome with traits for habitat adaptation and a secondary metabolites rich accessory genome. Front Microbiol 8:1–15. CrossRefGoogle Scholar
  5. Borowiec ML (2016) AMAS: a fast tool for alignment manipulation and computing of summary statistics. PeerJ 4:e1660. CrossRefPubMedPubMedCentralGoogle Scholar
  6. Chavali AK, Rhee SY (2018) Bioinformatics tools for the identification of gene clusters that biosynthesize specialized metabolites. Brief Bioinform 19:1022–1034. CrossRefPubMedGoogle Scholar
  7. Chen XH, Koumoutsi A, Scholz R, Eisenreich A, Schneider K, Heinemeyer I, Morgenstern B, Voss B, Hess WR, Reva O, Junge H, Voigt B, Jungblut PR, Vater J, Süssmuth R, Liesegang H, Strittmatter A, Gottschalk G, Borriss R (2007) Comparative analysis of the complete genome sequence of the plant growth–promoting bacterium Bacillus amyloliquefaciens FZB42. Nat Biotechnol 25:1007–1014. CrossRefPubMedGoogle Scholar
  8. Chen XH, Koumoutsi A, Scholz R, Borriss R (2008) More than anticipated - production of antibiotics and other secondary metabolites by Bacillus amyloliquefaciens FZB42. J Mol Microbiol Biotechnol 16:14–24. CrossRefPubMedGoogle Scholar
  9. Chen XH, Koumoutsi A, Scholz R, Schneider K, Vater J, Sussmuth R, Piel J, Borriss R (2009a) Genome analysis of Bacillus amyloliquefaciens FZB42 reveals its potential for biocontrol of plant pathogens. J Biotechnol 140:27–37. CrossRefPubMedGoogle Scholar
  10. Chen XH, Scholz R, Borriss M, Junge H, Mögel G, Kunz S, Borriss R (2009b) Difficidin and bacilysin produced by plant-associated Bacillus amyloliquefaciens are efficient in controlling fire blight disease. J Biotechnol 140:38–44. CrossRefPubMedGoogle Scholar
  11. Chowdhury SP, Dietel K, Rändler M, Schmid M, Junge H, Borriss R, Hartmann A, Grosch R (2013) Effects of Bacillus amyloliquefaciens FZB42 on lettuce growth and health under pathogen pressure and its impact on the rhizosphere bacterial community. PLoS One 8:1–10. CrossRefGoogle Scholar
  12. Chowdhury SP, Hartmann A, Gao XW, Borriss R (2015a) Biocontrol mechanism by root-associated Bacillus amyloliquefaciens FZB42 - a review. Front Microbiol 6:1–11. CrossRefGoogle Scholar
  13. Chowdhury SP, Uhl J, Grosch R, Alquéres S, Pittroff S, Dietel K, Schmitt-Kopplin P, Borriss R, Hartmann A (2015b) Cyclic lipopeptides of Bacillus amyloliquefaciens subsp. plantarum colonizing the lettuce rhizosphere enhance plant defense responses toward the bottom rot pathogen Rhizoctonia solani. Mol Plant-Microbe Interact 28:984–995. CrossRefPubMedGoogle Scholar
  14. Dessimoz C, Boeckmann B, Roth ACJ, Gonnet GH (2006) Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits. Nucleic Acids Res 34:3309–3316. CrossRefPubMedPubMedCentralGoogle Scholar
  15. Espariz M, Zuljan FA, Esteban L, Magni C (2016) Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: the Bacillus pumilus group case. PLoS One 11:e0163098. CrossRefPubMedPubMedCentralGoogle Scholar
  16. Fan B, Blom J, Klenk H-P, Borriss R (2017) Bacillus amyloliquefaciens, Bacillus velezensis, and Bacillus siamensis form an “operational group B. amyloliquefaciens” within the B. subtilis species complex. Front Microbiol 8:1–15. CrossRefGoogle Scholar
  17. Fang G, Bhardwaj N, Robilotto R, Gerstein MB (2010) Getting started in gene orthology and functional analysis. PLoS Comput Biol 6:e1000703. CrossRefPubMedPubMedCentralGoogle Scholar
  18. Forslund K (2011) The relationship between orthology, protein domain architecture and protein function. Stockholm University, StockholmGoogle Scholar
  19. Gu Q, Yang Y, Yuan Q, Shi G, Wu L, Lou Z, Huo R, Wu H, Borriss R, Gao X (2017) Bacillomycin D produced by Bacillus amyloliquefaciens is involved in the antagonistic interaction with the plant-pathogenic fungus Fusarium graminearum. Appl Environ Microbiol 83. doi:
  20. Horiike T, Minai R, Miyata D, Nakamura Y, Tateno Y (2016) Ortholog-finder: a tool for constructing an ortholog data set. Genome Biol Evol 8:446–457. CrossRefPubMedPubMedCentralGoogle Scholar
  21. Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf 11:119. CrossRefGoogle Scholar
  22. James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer New York, New YorkCrossRefGoogle Scholar
  23. Jensen RA (2001) Orthologs and paralogs - we need to get it right. Genome Biol 2:INTERACTIONS1002. CrossRefPubMedPubMedCentralGoogle Scholar
  24. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36:W5–W9. CrossRefPubMedPubMedCentralGoogle Scholar
  25. Kierul K, Voigt B, Albrecht D, Chen XH, Carvalhais LC, Borriss R (2015) Influence of root exudates on the extracellular proteome of the plant growth-promoting bacterium Bacillus amyloliquefaciens FZB42. Microbiology 161:131–147. CrossRefPubMedGoogle Scholar
  26. Kim B-Y, Lee S, Ahn J, Song J, Kim W, Weon H (2015) Complete genome sequence of Bacillus amyloliquefaciens subsp. plantarum CC178, a phyllosphere bacterium antagonistic to plant pathogenic fungi. Genome Announc 3:1–2.
  27. Koski LB, Golding GB (2001) The closest BLAST hit is often not the nearest neighbor. J Mol Evol 52:540–542. CrossRefPubMedGoogle Scholar
  28. Koumoutsi A, Chen X, Henne A, Hitzeroth G, Franke P, Vater J, Borriss R, Liesegang H (2004) Structural and functional characterization of gene clusters directing nonribosomal synthesis of bioactive cyclic lipopeptides in Bacillus amyloliquefaciens strain FZB42. J Bacteriol 186:1084–1096.
  29. Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for gene orthology inference. Brief Bioinform 12:379–391. CrossRefPubMedPubMedCentralGoogle Scholar
  30. Kuzniar A, van Ham RCHJ, Pongor S, Leunissen JAM (2008) The quest for orthologs: finding the corresponding gene across genomes. Trends Genet 24:539–551. CrossRefPubMedGoogle Scholar
  31. Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R, Thompson J, Gibson T, Higgins D (2007) ClustalW and ClustalX version 2. Bioinformatics 23:2947–2948. CrossRefGoogle Scholar
  32. Lee I, Kim YO, Park SC, Chun J (2016) OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int J Syst Evol Microbiol 66:1100–1103. CrossRefPubMedGoogle Scholar
  33. Letunic I, Bork P (2011) Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res 39:W475–W478. CrossRefPubMedPubMedCentralGoogle Scholar
  34. Li L (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189. CrossRefPubMedPubMedCentralGoogle Scholar
  35. Linard B, Thompson JD, Poch O, Lecompte O (2011) OrthoInspector: comprehensive orthology analysis and visual exploration. BMC Bioinf 12:11. CrossRefGoogle Scholar
  36. Liu Z, Budiharjo A, Wang P, Shi H, Fang J, Borriss R, Zhang K, Huang X (2013) The highly modified microcin peptide plantazolicin is associated with nematicidal activity of Bacillus amyloliquefaciens FZB42. Appl Microbiol Biotechnol 97:10081–10090. CrossRefPubMedGoogle Scholar
  37. Mi H, Muruganujan A, Thomas PD (2013) PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 41:377–386. CrossRefGoogle Scholar
  38. Nichio BTL, Marchaukoski JN, Raittz RT (2017) New tools in orthology analysis: a brief review of promising perspectives. Front Genet 8:1–12. CrossRefGoogle Scholar
  39. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901. CrossRefPubMedPubMedCentralGoogle Scholar
  40. Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, Von Mering C, Bork P (2014) EggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 42:231–239. CrossRefGoogle Scholar
  41. Pryszcz LP, Huerta-Cepas J, Gabaldón T (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res 39:e32–e32. CrossRefPubMedGoogle Scholar
  42. Rahman A, Uddin W, Wenner NG (2015) Induced systemic resistance responses in perennial ryegrass against Magnaporthe oryzae elicited by semi-purified surfactin lipopeptides and live cells of Bacillus amyloliquefaciens. Mol Plant Pathol 16:546–558. CrossRefPubMedGoogle Scholar
  43. Ranea JAG, Sillero A, Thornton JM, Orengo CA (2006) Protein superfamily evolution and the last universal common ancestor (LUCA). J Mol Evol 63:513–525. CrossRefPubMedGoogle Scholar
  44. Remm M, Storm CEV, Sonnhammer ELL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314:1041–1052. CrossRefPubMedGoogle Scholar
  45. Ryu CM, Farag MA, Hu CH, Reddy MS, Wei HX, Pare PW, Kloepper JW (2003) Bacterial volatiles promote growth in Arabidopsis. Proc Natl Acad Sci U S A 100:4927–4932.
  46. Sangar V, Blankenberg DJ, Altman N, Lesk AM (2007) Quantitative sequence-function relationships in proteins based on gene ontology. BMC Bioinf 8:1–15. CrossRefGoogle Scholar
  47. Schneider K, Chen X-H, Vater J, Franke P, Nicholson G, Borriss R, Süssmuth RD (2007) Macrolactin is the polyketide biosynthesis product of the pks2 cluster of Bacillus amyloliquefaciens FZB42. J Nat Prod 70:1417–1423.
  48. Scholz R, Molohon KJ, Nachtigall J, Vater J, Markley AL, Sussmuth RD, Mitchell DA, Borriss R (2011) Plantazolicin, a novel microcin B17/streptolysin S-like natural product from Bacillus amyloliquefaciens FZB42. J Bacteriol 193:215–224. CrossRefPubMedGoogle Scholar
  49. Scholz R, Vater J, Budiharjo A, Wang Z, He Y, Dietel K, Schwecke T, Herfort S, Lasch P, Borriss R (2014) Amylocyclicin, a novel circular bacteriocin produced by Bacillus amyloliquefaciens FZB42. J Bacteriol 196:1842–1852. CrossRefPubMedPubMedCentralGoogle Scholar
  50. Schreiber F, Patricio M, Muffato M, Pignatelli M, Bateman A (2014) TreeFam v9: a new website, more species and orthology-on-the-fly. Nucleic Acids Res 42:922–925. CrossRefGoogle Scholar
  51. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. CrossRefPubMedPubMedCentralGoogle Scholar
  52. Studer RA, Robinson-Rechavi M (2009) How confident can we be that orthologs are similar, but paralogs differ? Trends Genet 25:210–216. CrossRefPubMedGoogle Scholar
  53. Sullivan MJ, Petty NK, Beatson SA (2011) Easyfig: a genome comparison visualizer. Bioinformatics 27:1009–1010. CrossRefPubMedPubMedCentralGoogle Scholar
  54. Sutphin GL, Mahoney JM, Sheppard K, Walton DO, Korstanje R (2016) WORMHOLE: novel least diverged ortholog prediction through machine learning. PLoS Comput Biol 12:e1005182. CrossRefPubMedPubMedCentralGoogle Scholar
  55. Suzuki R, Shimodaira H (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22:1540–1542. CrossRefPubMedGoogle Scholar
  56. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577. CrossRefGoogle Scholar
  57. Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637. CrossRefPubMedGoogle Scholar
  58. Train CM, Glover NM, Gonnet GH, Altenhoff AM, Dessimoz C (2017) Orthologous matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference. Bioinformatics 33:i75–i82. CrossRefPubMedPubMedCentralGoogle Scholar
  59. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19:327–335. CrossRefPubMedPubMedCentralGoogle Scholar
  60. Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV (2011) OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011. Nucleic Acids Res 39:271–275. CrossRefGoogle Scholar
  61. Wu L, Wu H, Chen L, Xie S, Zang H, Borriss R, Gao X (2014) Bacilysin from Bacillus amyloliquefaciens FZB42 has specific bactericidal activity against harmful algal bloom species. Appl Environ Microbiol 80:7512–7520. CrossRefPubMedPubMedCentralGoogle Scholar
  62. Wu L, Wu H, Chen L, Lin L, Borriss R, Gao X (2015a) Bacilysin overproduction in Bacillus amyloliquefaciens FZB42 markerless derivative strains FZBREP and FZBSPA enhances antibacterial activity. Appl Microbiol Biotechnol 99:4255–4263. CrossRefPubMedGoogle Scholar
  63. Wu L, Wu H, Chen L, Yu X, Borriss R, Gao X (2015b) Difficidin and bacilysin from Bacillus amyloliquefaciens FZB42 have antibacterial activity against Xanthomonas oryzae rice pathogens. Sci Rep 5:12975. CrossRefPubMedPubMedCentralGoogle Scholar
  64. Xu B, Lu Y, Ye Z, Zheng Q, Wei T, Lin J-F, Guo L-Q (2018) Genomics-guided discovery and structure identification of cyclic lipopeptides from the Bacillus siamensis JFL15. PLoS One 13:e0202893. CrossRefPubMedPubMedCentralGoogle Scholar
  65. Yoon SH, Ha SM, Kwon S, Lim J, Kim Y, Seo H, Chun J (2017) Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 67:1613–1617. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Laboratorio de Biotecnología e Inocuidad de los Alimentos, Facultad de Ciencias Bioquímicas y FarmacéuticasUniversidad Nacional de RosarioRosarioArgentina
  2. 2.Laboratorio de Genética y Fisiología de Bacterias Lácticas, Instituto de Biología Molecular y Celular de Rosario (IBR - CONICET)sede FCByF – UNRRosarioArgentina
  3. 3.Laboratorio de Investigaciones en Fisiología y Biología Molecular Vegetal (LIFiBVe), Cátedra de Fisiología Vegetal, Facultad de Ciencias AgrariasUniversidad Nacional del LitoralEsperanzaArgentina
  4. 4.Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)Buenos AiresArgentina
  5. 5.Área Estadística y Procesamiento de Datos, Departamento de Matemática y Estadística, Facultad de Ciencias Bioquímicas y FarmacéuticasUniversidad Nacional de RosarioRosarioArgentina

Personalised recommendations