On incremental computation of transitive closure and greedy alignment

  • Saïd Abdeddaïm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1264)


Several algorithms based on heuristics have been proposed for the multiple alignment of sequences. The most efficient in time computation are often greedy algorithms. At each step a greedy alignment algorithm must know if two characters are alignable or not, regarding to the characters definitely aligned before. We show that this problem is reducible to find paths in a directed graph. We give an incremental algorithm that maintains the transitive closure of a graph for which we know a spanning set of k disjoined paths. Our algorithm maintains the transitive closure of a graph of n vertices and m edges (in the final state) in O(k 2 m+n minm, n) time and O(kn) space. We show that this algorithm can be used by any greedy alignment algorithm to know in constant time if two characters are alignable or not, by maintaining the transitive closure of an alignment graph in O(k2n+n2) time and O(kn) space, for k sequences whose total length is n. As an example of application we have implemented TwoAlign a efficient multiple alignment program based on greedy computation of pairwise local alignments.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    M. R. Garey and D. S. Johnson. Computers and intractability; a guide to the theory of NP-completeness. Freeman, 1979.Google Scholar
  2. 2.
    L. Wang and T. Jiang. On the complexity of multiple sequence alignment. J. Comput. Biol., 1:337–348, 1994.Google Scholar
  3. 3.
    T. Jiang, E. L. Lawler, and L. Wang. Aligning sequences via an evolutionary tree: complexity and approximation. In Proc. 26-th Annual ACM Symp. Theory of Comput., pages 760–769, 1994.Google Scholar
  4. 4.
    P. Hogeweg and B. Hesper. The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J. Mol. Evol., 20:175–186, 1984.Google Scholar
  5. 5.
    D-F. Feng and R. F. Doolittle. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol., 25:351–360, 1987.Google Scholar
  6. 6.
    W. R. Taylor. Protein structure prediction. In M. J. Bishop and C. J. Rawlings, editors, Nucleic Acid and Protein Sequence Analysis, a Practical Approach., pages 285–323. IRL Press, 1987.Google Scholar
  7. 7.
    F. Corpet. Multiple sequence alignment with hierarchial clustering. Nucleic Acids Research, 16(22): 10881–10890, 1988.Google Scholar
  8. 8.
    D.G. Higgins and P.M. Sharp. Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS, 5:151–153, 1989.Google Scholar
  9. 9.
    O. Gotoh. Further improvement in methods of group-to-group sequence alignment with generalized profile operations. CABIOS, 10(4):379–387, 1994.Google Scholar
  10. 10.
    A. M. Landraud, J. F. Avril, and P. Chrétienne. An algorithm for finding a common structure shared by a family of strings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11:890–895, 1989.Google Scholar
  11. 11.
    Said Abdeddaim. Fast and sound two-step algorithms for multiple alignment of nucleic sequences. In Proceedings of the IEEE International Joint Symposia on Intelligence and Systems, pages 4–11, 1996.Google Scholar
  12. 12.
    T. Ibaraki and N. Katoh. On-line computation of transitive closure for graphs. Inform. Proc. Lett., 16:95–97, 1983.Google Scholar
  13. 13.
    G. F. Italiano. Amortized efficiency of a path retrieval data structure. Theor. Comput. Sci., 48:273–281, 1986.Google Scholar
  14. 14.
    J. A. La Poutré and J. van Leeuwen. Maintenance of transitive closure and transitive reduction of graphs. In Proc. Workshop on Graph-Theoretic Concepts in Computer Science, pages 106–120. Lecture Notes in Computer Science 314, Springer-Verlag, 1988.Google Scholar
  15. 15.
    J. D. Thompson, D. G. Higgins, and T. J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673–4680, 1994.Google Scholar
  16. 16.
    F. Mattern. Virtual time and global states of distributed systems. In Proc. Workshop on Parallel and Distributed Algorithms, pages 215–226, 1989.Google Scholar
  17. 17.
    C. J. Fidge. Timestamps in message-passing systems that preserve the partial ordering. In 11-th Australian Computer Science Conference, pages 55–66, 1988.Google Scholar
  18. 18.
    J. Kececioglu. The maximum weight trace problem in multiple sequence alignment. In 4-th Annual Symp. Combinatorial Pattern Matching, volume 684 of LNCS, pages 106–119. 1993.Google Scholar
  19. 19.
    S. F. Altschul. Gap costs for multiple sequence alignment. J. Theor. Biol., 138:297–309, 1989.Google Scholar
  20. 20.
    T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195–197, 1981.Google Scholar
  21. 21.
    T. K. Attwood, M. E. Beck, A. J. Bleasby, and D. J. Parry-Smith. PRINTS — a database of protein motif fingeprints. Nucleic Acids Research, 22:3590–3596, 1994.Google Scholar
  22. 22.
    M. S. Waterman. Mathematical Methods for DNA Sequences. C.R.C. Press, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Saïd Abdeddaïm
    • 1
  1. 1.BGBP - UMR CNRS 5558Université Claude Bernard (LYON 1)Villeurbanne CedexFrance

Personalised recommendations