Abstract
We describe GESTALT (GEnomic sequences STeiner ALignmenT), a public-domain suite of programs for generating multiple alignments of a set of biosequences.We allow the use of either of the two popular objectives, Tree Alignment or Sum-of-Pairs. The main distinguishing feature of our method is that the alignment is obtained via a tree in which the internal nodes (ancestors) are labeled by Steiner sequences for triples of the input sequences. Given lists of candidate labels for the ancestral sequences, we use dynamic programming to choose an optimal labeling under either objective function. Finally, the fully labeled tree of sequences is turned into into a multiple alignment. Enhancements in our implementation include the traditional space-saving ideas of Hirschberg as well as new data-packing techniques. The running-time bottleneck of computing exact Steiner sequences is handled by a highly effective but much faster heuristic alternative. Finally, other modules in the suite allow automatic generation of linear-program input files that can be used to compute new lower bounds on the optimal values. We also report on some preliminary computational experiments with GESTALT.
Most of this work was done when this author was visiting CMU during Summer’ 98, under a grant from the CMU Faculty Development Fund.
Supported in part by an NSF CAREER grant CCR-9625297
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
S. Altschul and D. Lipman, Trees, Stars and Multiple Sequence Alignment, SIAM J. Appl. Math. 49 (1989) 197–209
S. Altschul, D. Lipman and J.D. Kececioglu, A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. USA 86 (1989) 4412–4415
V. Bafna, E.L. Lawler and P. Pevzner. Approximation Algorithms for Multiple Sequence Alignment. Proceedings of the 5th Combinatorial Pattern Matching conference LNCS 807 (1994) 43–53
H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 49:1 (1989) 197–209
S.C. Chan, A.K. C.Wong and D.K.Y. Chiu, “A survey of multiple sequence comparison methods,” Bull. Math. Biol. 54 (1992) 563–598
D. Feng and R. Doolittle. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J.Molec. Evol. 25 (1987) 351–360
O. Gotoh, Optimal alignment between groups of sequences and its application to multiple sequence alignment, CABIOS 9:3 (1993) 361–370
S.K. Gupta, J. Kececioglu, and A.A. Schaffer, Making the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment More Space Efficient in Practice, (extended abstract) Proceedings of the 6th Combinatorial Pattern Matching conference (1995)
D. Gusfield, Efficient methods for multiple sequence alignment with guaranteed error bounds, Bulletin of Mathematical Biology 55 (1993) 141–154
D. Gusfield and L. Wang, New Uses for Uniform Lifted Alignments, Submitted for publication (1996)
D.G. Higgins, A.J. Bleasby and R. Fuchs, Clustal V: Improved software for multiple sequence alignment, CABIOS 8 (1992) 189–191
D. Hirschberg, A linear space algorithm for computing maximal common subsequences, Communications of the ACM 18 (1975) 341–343
T. Jiang and F. Liu, Tree Alignment And Reconstruction application software, Version 1.0, February 1998. Available from http://www.dcss.mcmaster.ca/~fliu.
D. Lipman, S. Altschul and J.D. Kececioglu, A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. USA 86 (1989) 4412–4415
S.B. Needleman and C.D. Wunsch. A general method applicable to search the similarities in the amino acid sequences of two proteins. J. Mol. Biol., 48 (1970) 444
M.A. McClure, T.K. Vasi and W.M. Fitch. Comparative analysis of multiple protein sequence alignment methods, Mol. Biol. Evol. 11 (1994) 571–592
R. Ravi and J. Kececioglu. Approximation algorithms for multiple sequence alignment under a fixed evolutionary tree, Proceedings of the 6th Combinatorial Pattern Matching conference (1995) 330–339
D. Sankoff, Minimal mutation trees of sequences, SIAM J. Applied Math. 28(1) (1975) 35–42
D. Sankoff and R. Cedergren, Simultaneous comparison of three or more sequences related by a tree, inD. Sankoff and J. Kruskal editors, Time warps, string edits and macromolecules: the theory and practice of sequence comparison, Addison Wesley (1983) 253–264
D. Sankoff, R. Cedergren and G. Laplame, Frequency of insertion-deletion, transversion, and transition in the evolution of the 5s ribosomal rna, J. Mol. Evol. 7 (1976) 133–149
D. Sankoff, Analytical approaches to genomic evolution, Biochimie 75 (1993) 409–413
T.F. Smith and M.S. Waterman. Comparison of Biosequences. Adv. Appl. Math. (1981) 482–489
W.R. Taylor and D.T. Jones. Deriving an Amino Acid Distance Matrix, J. Theor. Biol. 164 (1993) 65–83
M. Vingron and P. Argos. A fast and sensitive multiple sequence alignment algorithm. Comput. Appl. Biosci. 5 (1989) 115–121
L. Wang and D. Gusfield. Improved Approximation Algorithms for Tree Alignment, Proceedings of the 7th Combinatorial Pattern Matching conference (1996) 220–233
L. Wang and T. Jiang. On the complexity of multiple sequence alignment, J. Comp. Biol. 1 (1994) 337–348
L. Wang, T. Jiang and E.L. Lawler. Aligning sequences via an evolutionary tree: complexity and approximation, Algorithmica, to appear. Also presented at the 26th ACM Symp. on Theory of Computing (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lancia, G., Ravi, R. (1999). GESTALT: Genomic Steiner Alignments. In: Crochemore, M., Paterson, M. (eds) Combinatorial Pattern Matching. CPM 1999. Lecture Notes in Computer Science, vol 1645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48452-3_8
Download citation
DOI: https://doi.org/10.1007/3-540-48452-3_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66278-5
Online ISBN: 978-3-540-48452-3
eBook Packages: Springer Book Archive