Regular Language Constrained Sequence Alignment Revisited

  • Gregory Kucherov
  • Tamar Pinhas
  • Michal Ziv-Ukelson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6460)


Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, Arslan [1] introduced the Regular Language Constrained Sequence Alignment Problem and proposed an O(n 2 t 4) time and O(n 2 t 2) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the non-deterministic automaton, which is given as input. Chung et al. [2] proposed a faster O(n 2 t 3) time algorithm for the same problem. In this paper, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n 2 t 3/logt). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve the run time complexity in the worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense.


Regular Expression Steiner Tree Steiner Minimal Tree Boolean Vector Hadamard Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arslan, A.: Regular expression constrained sequence alignment. Journal of Discrete Algorithms 5(4), 647–661 (2007)CrossRefzbMATHGoogle Scholar
  2. 2.
    Chung, Y., Lu, C., Tang, C.: Efficient algorithms for regular expression constrained sequence alignment. Information Processing Letters 103(6), 240–246 (2007)CrossRefzbMATHGoogle Scholar
  3. 3.
    Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)CrossRefGoogle Scholar
  4. 4.
    Arslan, A., Egecioglu, O.: Algorithms for the constrained longest common subsequence problems. International Journal of Foundations of Computer Science 16(6), 1099–1110 (2005)CrossRefzbMATHGoogle Scholar
  5. 5.
    Chen, Y., Chao, K.: On the generalized constrained longest common subsequence problems. Journal of Combinatorial Optimization, 1–10 (2009)Google Scholar
  6. 6.
    Iliopoulos, C., Rahman, M.: New efficient algorithms for the LCS and constrained LCS problems. Information Processing Letters 106(1), 13–18 (2008)CrossRefzbMATHGoogle Scholar
  7. 7.
    Peng, Z., Ting, H.: Time and space efficient algorithms for constrained sequence alignment. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 237–246. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Tsai, Y.: The constrained longest common subsequence problem. Information Processing Letters 88(4), 173–176 (2003)CrossRefzbMATHGoogle Scholar
  9. 9.
    Bairoch, A.: The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Research 21(13), 3097 (1993)CrossRefGoogle Scholar
  10. 10.
    Tang, C., Lu, C., Chang, M., Tsai, Y., Sun, Y., Chao, K., Chang, J., Chiou, Y., Wu, C., Chang, H., et al.: Constrained multiple sequence alignment tool development and its application to RNase family alignment. Journal of Bioinformatics and Computational Biology 1(2), 267–287 (2003)CrossRefGoogle Scholar
  11. 11.
    Bern, M., Plassmann, P.: The Steiner problem with edge lengths 1 and 2. Information Processing Letters 32(4), 171–176 (1989)CrossRefzbMATHGoogle Scholar
  12. 12.
    Shi, W., Su, C.: The rectilinear Steiner arborescence problem is NP-complete. SIAM Journal on Computing 35(3), 729–740 (2006)CrossRefzbMATHGoogle Scholar
  13. 13.
    Foulds, L., Graham, R.: The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics 3(43-49), 299 (1982)zbMATHGoogle Scholar
  14. 14.
    Jia, W., Han, B., Au, P., He, Y., Zhou, W.: Optimal multicast tree routing for cluster computing in hypercube interconnection networks. IEICE Transactions on Information and Systems E87-D, 1625–1632 (2004)Google Scholar
  15. 15.
    Lin, X., Ni, L.: Multicast communication in multicomputer networks. IEEE Transactions on Parallel and Distributed Systems 4(10), 1105–1117 (1993)CrossRefGoogle Scholar
  16. 16.
    Sheu, S., Yang, C.: Multicast algorithms for hypercube multiprocessors. Journal of Parallel and Distributed Computing 61(1), 137–149 (2001)CrossRefzbMATHGoogle Scholar
  17. 17.
    Dinur, I., Safra, S.: On the hardness of approximating minimum vertex cover. Annals of Mathematics 162(1), 439–486 (2005)CrossRefzbMATHGoogle Scholar
  18. 18.
    Sylvester, J.: Thoughts on inverse orthogonal matrices simultaneous sign successions, and tessellated pavements in two or more colors, with applications to Newton’s rule, ornamental tile-work and the theory of numbers. Phil. Mag. 34(2), 461–475 (1867)Google Scholar
  19. 19.
    Seberry, J., Yamada, M.: Hadamard matrices, sequences, and block designs. Contemporary Design Theory: A Collection of Surveys, 431–560 (1992)Google Scholar
  20. 20.
    Savage, J.: An algorithm for the computation of linear forms. SIAM J. Comput. 3(2), 150–158 (1974)CrossRefzbMATHGoogle Scholar
  21. 21.
    Hromkoviěc, J., Seibert, S., Wilke, T.: Translating regular expressions into small ε-free nondeterministic finite automata. Journal of Computer and System Sciences 62(4), 565–588 (2001)CrossRefzbMATHGoogle Scholar
  22. 22.
    Schnitger, G.: Regular expressions and NFAs without epsilon-transitions. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, p. 432. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Geffert, V.: Translation of binary regular expressions into nondeterministic ε-free automata with O(n logn) transitions. Journal of Computer and System Sciences 66(3), 451–472 (2003)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Gregory Kucherov
    • 1
  • Tamar Pinhas
    • 2
  • Michal Ziv-Ukelson
    • 2
  1. 1.LIFL/CNRS and INRIA Lille Nord-EuropeVilleneuve d’AscqFrance
  2. 2.Department of Computer ScienceBen-Gurion University of the NegevBe’er ShevaIsrael

Personalised recommendations