Skip to main content

Approximation algorithms for multiple sequence alignment

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 807))

Included in the following conference series:

Abstract

We consider the problem of aligning of k sequences of length n. The cost function is sum of pairs, and satisfies triangle inequality. Earlier results on finding approximation algorithms for this problem are due to Gusfield, 1991, who achieved an approximation ratio of 2 − 2/k, and Pevzner, 1992, who improved it to 2 − 3/k. We generalize this approach to assemble an alignment of k sequences from optimally aligned subsets of l<k sequences to obtain an improved performance guarantee. For arbitrary l<k, we devise deterministic and randomized algorithms yielding performance guarantees of 2−l/k. For fixed l, the running times of these algorithms are polynomial in n and k.

The research was supported in part by the National Science Foundation under grant CCR-9308567, the National Institute of Health under grant R01 HG00987 and the DOE grant DE-FG03-90ER60999.

Research supported in part by the DOE grant DE-FG03-90ER60999.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul S.F., Lipman D.J., Trees, stars, and multiple biological sequence alignment. SIAM J. Appl. Math., 49, (1989), pp. 197–209.

    Google Scholar 

  2. Baranyai, Z., On the factorization of the complete uniform hypergraph, Infinite and Finite Sets, A. Hajnal, T. Rado, V. T. Sós, eds., North-Holland, Amsterdam, (1975), pp. 91–108.

    Google Scholar 

  3. Bósak, J., Decompositions of Graphs, Kluwer Academic Publishers, (1990).

    Google Scholar 

  4. Carter J.L., Wegman M.N., Universal classes of hash functions, Journal of Computer and System Sciences, 18(1979), pp. 143–154.

    Google Scholar 

  5. Chan S.C., Wong A.K.C., Chiu D.K.Y., A survey of multiple sequence comparison methods, Bull. Math. Biol., 54(1992), pp. 563–598.

    Google Scholar 

  6. Feng D., Doolittle R., Progressive sequence alignment as a prerequisite to correct phylogenetic trees, Journal of Molec. Evol., 25(1987), pp. 351–360.

    Google Scholar 

  7. Gusfield, D., Efficient methods for multiple sequence alignment with guaranteed error bounds. Tech. Report, Computer Science Division, Uiversity of California, Davis, CSE-91-4, (1991).

    Google Scholar 

  8. Gusfield, D., Efficient methods for multiple sequence alignment with guaranteed error bounds, Bulletin of Mathematical Biology, 55(1993), pp. 141–154.

    Google Scholar 

  9. Kececioglu J., The maximum weight trace alignment problem in multiple sequence alignment, eds. A. Apostolico, M. Crochemore, Z. Galil, U. Manber, Combinatorial Pattern matching 93, Padova, Italy, June 1993, LNCS 684, 106–119.

    Google Scholar 

  10. Lipman D.J., Altschul S.F., Kececioglu J.D., A tool for multiple sequence alignment, Proc. Natl. Acad. Sci. USA, 86(1989), pp. 4412–4415.

    Google Scholar 

  11. Lorimer, P., Finite Projective Planes and Sharply 2-transitive Subsets of Finite Groups, Proc. Second Internat. Conf. Theory of Groups, Canberra, (1973), pp. 432–436.

    Google Scholar 

  12. Pevzner, P., Multiple Alignment, Communication Cost, and Graph Matching, SIAM J. Applied Math., 52, (1992), pp. 1763–1779.

    Google Scholar 

  13. Sankoff D., Minimum mutation tree of sequences, SIAM J. Appl. Math., 28, (1975), pp. 35–42.

    Google Scholar 

  14. Sankoff D., Simultaneous solution of the RNA folding, alignment and protosequence problems, SIAM J. Appl. Math., 45 (1985), pp. 810–825.

    Google Scholar 

  15. Schmidt J., Siegel A., The analysis closed hashing under limited randomness, Proceedings of the 22nd ACM Symposium on Theory of Computing, (1990), pp. 224–234.

    Google Scholar 

  16. Wang L., Jiang, T., On the Complexity of Multiple Sequence Alignment, 1993, J. of Comp. Biol. (to appear).

    Google Scholar 

  17. Waterman M.S., Smith T.F., Beyer W.A., Some biological sequence metrics. Adv. in Math., 20(1976), pp. 367–387.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maxime Crochemore Dan Gusfield

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bafna, V., Lawler, E.L., Pevzner, P.A. (1994). Approximation algorithms for multiple sequence alignment. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-58094-8_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58094-2

  • Online ISBN: 978-3-540-48450-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics