Regular Expression Constrained Sequence Alignment
Given strings S 1, S 2, and a regular expression R, we introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between S 1 and S 2 over all alignments such that in these alignments there exists a segment where some substring s 1 of S 1 is aligned with some substring s 2 of S 2, and both s 1 and s 2 match R, i.e. s 1,s 2 ∈ L(R) where L(R) is the regular language described by R. A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an O(nmr) time algorithm for the regular expression constrained sequence alignment problem where n, and m are the lengths of S 1, and S 2, respectively, and r is in the order of the size of the transition function of a finite automaton M that we create from a nondeterministic finite automaton N accepting L(R). M contains O(t 2) states if N has t states.
KeywordsRegular expression sequence alignment dynamic programming pattern matching finite automaton
Unable to display preview. Download preview PDF.
- 1.Arslan, N., Eğecioğlu, Ö.: Algorithms for the constrained common sequence problem. In: Simanek, M., Holub, J. (eds.) Proc. Prague Stringology Conference 2004, Prague, August 2004, pp. 24–32 (2004)Google Scholar
- 3.Chin, F.Y.L., Ho, N.L., Lam, T.W., Wong, P.W.H., Chan, M.Y.: Efficient constrained multiple sequence alignment with performance guarantee. In: Proc. IEEE Computational Systems Bioinformatics (CSB 2003), pp. 337–346 (2003)Google Scholar
- 9.Tang, C.Y., Lu, C.L., Chang, M.D.-T., Tsai, Y.-T., Sun, Y.-J., Chao, K.-M., Chang, J.M., Chiou, Y.-H., Wu, C.-M., Chang, H.-T., Chou, W.-I.: Constrained multiple sequence alignment tool development and its applications to rnase family alignment. In: Proceeding of the 1st IEEE Computer Society Bioinformatics Conference (CSB 2002), pp. 127–137 (2002)Google Scholar
- 12.Walker, J.E., Saraste, M., Runswick, M.J., Gay, N.J.: Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATPrequiring enzymes and a common nucleotide binding fold. EMBO J. 1, 945–951 (1982)Google Scholar