Abstract
Multiple Genome-wide Alignments are a first crucial step to compare genomes. Gain and loss of genes, duplications and genomic rearrangements are challenging problems that aggravate with increasing phylogenetic distances. We describe a multiple genome-wide alignment tool for bacteria, called POMAGO, which is based on orthologous genes and their syntenic information determined by Proteinortho.This strategy enables POMAGO to efficiently define anchor points even across wide phylogenetic distances and outperform existing approaches in this field of application. The given set of orthologous genes is enhanced by several cleaning and completion steps, including the addition of previously undetected orthologous genes. Protein-coding genes are aligned on nucleotide and protein level, whereas intergenic regions are aligned on nucleotide level only. We tested and compared our program at three very different sets of bacteria that exhibit different degrees of phylogenetic distances: 1) 15 closely related, well examined and described E. coli species, 2) six more divergent Aquificales, as putative basal bacteria, and 3) a set of eight extreme divergent species, distributed among the whole phylogenetic tree of bacteria. POMAGO is written in a modular way which allows extending or even exchanging algorithms in different stages of the alignment process. Intergenic regions might for instance be aligned using an RNA secondary structure aware algorithm rather than to rely on sequence data alone. The software is freely available from http://www.rna.uni-jena.de/supplements/pomago
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Angiuoli, S.V., Salzberg, S.L.: Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 27(3), 334–342 (2011)
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., Haussler, D., Miller, W.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4), 708–715 (2004)
Blattner, F.R., Plunkett, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., Gregor, J., Davis, N.W., Kirkpatrick, H.A., Goeden, M.A., Rose, D.J., Mau, B., Shao, Y.: The complete genome sequence of Escherichia coli K-12. Science 277(5331), 1453–1462 (1997)
Boussau, B., Guéguen, L., Gouy, M.: Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of bacteria. BMC Evol. Biol. 8, 272–272 (2008)
Braga, M.D., Machado, R., Ribeiro, L.C., Stoye, J.: Genomic distance under gene substitutions. BMC Bioinformatics 12(suppl. 9) (2011)
Bray, N., Pachter, L.: MAVID: constrained ancestral alignment of multiple sequences. Genome Res. 14(4), 693–699 (2004)
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66–66 (2003)
Burggraf, S., Olsen, G.J., Stetter, K.O., Woese, C.R.: A phylogenetic analysis of Aquifex pyrophilus. Syst. Appl. Microbiol. 15(3), 352–356 (1992)
Chen, X., Tompa, M.: Comparative assessment of methods for aligning multiple genome sequences. Nat. Biotechnol. 28(6), 567–572 (2010)
Darling, A.E., Mau, B., Perna, N.T.: progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6) (2010)
Deckert, G., Warren, P.V., Gaasterland, T., Young, W.G., Lenox, A.L., Graham, D.E., Overbeek, R., Snead, M.A., Keller, M., Aujay, M., Huber, R., Feldman, R.A., Short, J.M., Olsen, G.J., Swanson, R.V.: The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392(6674), 353–358 (1998)
Dieterich, C., Wang, H., Rateitschak, K., Luz, H., Vingron, M.: CORG: a database for COmparative Regulatory Genomics. Nucleic Acids Res. 31(1), 55–57 (2003)
Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)
Frith, M.C., Hamada, M., Horton, P.: Parameters for accurate genome alignment. BMC Bioinformatics 11, 80–80 (2010)
Gruber, A.R., Findeiß, S., Washietl, S., Hofacker, I.L., Stadler, P.F.: RNAz 2.0: improved noncoding RNA detection. Pac. Symp. Biocomput. 15, 69–79 (2010)
Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustal W and Clustal X version 2.0. Bioinformatics 23(21), 2947–2948 (2007)
Lechner, M., Findeiss, S., Steiner, L., Marz, M., Stadler, P.F., Prohaska, S.J.: Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinformatics 12, 124–124 (2011)
Qi, Z.-H., Du, M.-H., Qi, X.-Q., Zheng, L.-J.: Gene comparison based on the repetition of single-nucleotide structure patterns. Computers in Biology and Medicine 42, 975–981 (2012)
Rose, D., Hertel, J., Reiche, K., Stadler, P.F., Hackermüller, J.: NcDNAlign: plausible multiple alignments of non-protein-coding genomic sequences. Genomics 92(1), 65–74 (2008)
Wang, X., Fu, Y., Zhao, Y., Wang, Q., Pedamallu, C.S., Xu, S.Y., Niu, Y.: Accurate reconstruction of molecular phylogenies for proteins using codon and amino acid unified sequence alignments (CAUSA). Nature Proceedings (2001)
Will, A., Joshi, T., Hofacker, I.L., Stadler, P.F., Backofen, R.: LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. RNA 18(5), 900–914 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wieseke, N., Lechner, M., Ludwig, M., Marz, M. (2013). POMAGO: Multiple Genome-Wide Alignment Tool for Bacteria. In: Cai, Z., Eulenstein, O., Janies, D., Schwartz, D. (eds) Bioinformatics Research and Applications. ISBRA 2013. Lecture Notes in Computer Science(), vol 7875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38036-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-38036-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38035-8
Online ISBN: 978-3-642-38036-5
eBook Packages: Computer ScienceComputer Science (R0)