Regular Expression Constrained Sequence Alignment

  • Abdullah N. Arslan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3537)


Given strings S 1, S 2, and a regular expression R, we introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between S 1 and S 2 over all alignments such that in these alignments there exists a segment where some substring s 1 of S 1 is aligned with some substring s 2 of S 2, and both s 1 and s 2 match R, i.e. s 1,s 2L(R) where L(R) is the regular language described by R. A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an O(nmr) time algorithm for the regular expression constrained sequence alignment problem where n, and m are the lengths of S 1, and S 2, respectively, and r is in the order of the size of the transition function of a finite automaton M that we create from a nondeterministic finite automaton N accepting L(R). M contains O(t 2) states if N has t states.


Regular expression sequence alignment dynamic programming pattern matching finite automaton 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arslan, N., Eğecioğlu, Ö.: Algorithms for the constrained common sequence problem. In: Simanek, M., Holub, J. (eds.) Proc. Prague Stringology Conference 2004, Prague, August 2004, pp. 24–32 (2004)Google Scholar
  2. 2.
    Bork, P., Koonin, E.V.: Protein sequence motifs. Curr. Opin. Struct. Biol. 6, 366–376 (1996)CrossRefGoogle Scholar
  3. 3.
    Chin, F.Y.L., Ho, N.L., Lam, T.W., Wong, P.W.H., Chan, M.Y.: Efficient constrained multiple sequence alignment with performance guarantee. In: Proc. IEEE Computational Systems Bioinformatics (CSB 2003), pp. 337–346 (2003)Google Scholar
  4. 4.
    Comet, J.-P., Henry, J.: Pairwise sequence alignment using a PROSITE patternderived similarity score. Computers and Chemistry 26, 421–436 (2002)CrossRefGoogle Scholar
  5. 5.
    Doolittle, R.F.: Similar amino acid sequences: chance or common ancestry. Science 214, 149–159 (1981)CrossRefGoogle Scholar
  6. 6.
    Hopcroft, J.E., Ullman, J.D.: Introduction to automata theory, languages, and computation. Addison-Wesley Publishing Company, Reading (1979)zbMATHGoogle Scholar
  7. 7.
    Chin, F.Y.L., Santis, A.D., Ferrara, A.L., Ho, N.L., Kim, S.K.: A simple algorithm for the constrained sequence problems. Information Processing Letters 90, 175–179 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002)CrossRefGoogle Scholar
  9. 9.
    Tang, C.Y., Lu, C.L., Chang, M.D.-T., Tsai, Y.-T., Sun, Y.-J., Chao, K.-M., Chang, J.M., Chiou, Y.-H., Wu, C.-M., Chang, H.-T., Chou, W.-I.: Constrained multiple sequence alignment tool development and its applications to rnase family alignment. In: Proceeding of the 1st IEEE Computer Society Bioinformatics Conference (CSB 2002), pp. 127–137 (2002)Google Scholar
  10. 10.
    Tsai, Y.-T.: The constrained common sequence problem. Information Processing Letters 88, 173–176 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Tsai, Y.-T., Lu, C.L., Yu, C.T., Huang, Y.P.: MuSiC: A tool for multiple sequence alignment with constraint. Bioinformatics 20(14), 2309–2311 (2004)CrossRefGoogle Scholar
  12. 12.
    Walker, J.E., Saraste, M., Runswick, M.J., Gay, N.J.: Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATPrequiring enzymes and a common nucleotide binding fold. EMBO J. 1, 945–951 (1982)Google Scholar
  13. 13.
    Waterman, M.S.: Introduction to computational biology. Chapman & Hall, Boca Raton (1995)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Abdullah N. Arslan
    • 1
  1. 1.Department of Computer ScienceThe University of VermontBurlingtonUSA

Personalised recommendations