Advertisement

Regular Expression Constrained Sequence Alignment

  • Abdullah N. Arslan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3537)

Abstract

Given strings S 1, S 2, and a regular expression R, we introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between S 1 and S 2 over all alignments such that in these alignments there exists a segment where some substring s 1 of S 1 is aligned with some substring s 2 of S 2, and both s 1 and s 2 match R, i.e. s 1,s 2L(R) where L(R) is the regular language described by R. A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an O(nmr) time algorithm for the regular expression constrained sequence alignment problem where n, and m are the lengths of S 1, and S 2, respectively, and r is in the order of the size of the transition function of a finite automaton M that we create from a nondeterministic finite automaton N accepting L(R). M contains O(t 2) states if N has t states.

Keywords

Regular expression sequence alignment dynamic programming pattern matching finite automaton 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arslan, N., Eğecioğlu, Ö.: Algorithms for the constrained common sequence problem. In: Simanek, M., Holub, J. (eds.) Proc. Prague Stringology Conference 2004, Prague, August 2004, pp. 24–32 (2004)Google Scholar
  2. 2.
    Bork, P., Koonin, E.V.: Protein sequence motifs. Curr. Opin. Struct. Biol. 6, 366–376 (1996)CrossRefGoogle Scholar
  3. 3.
    Chin, F.Y.L., Ho, N.L., Lam, T.W., Wong, P.W.H., Chan, M.Y.: Efficient constrained multiple sequence alignment with performance guarantee. In: Proc. IEEE Computational Systems Bioinformatics (CSB 2003), pp. 337–346 (2003)Google Scholar
  4. 4.
    Comet, J.-P., Henry, J.: Pairwise sequence alignment using a PROSITE patternderived similarity score. Computers and Chemistry 26, 421–436 (2002)CrossRefGoogle Scholar
  5. 5.
    Doolittle, R.F.: Similar amino acid sequences: chance or common ancestry. Science 214, 149–159 (1981)CrossRefGoogle Scholar
  6. 6.
    Hopcroft, J.E., Ullman, J.D.: Introduction to automata theory, languages, and computation. Addison-Wesley Publishing Company, Reading (1979)zbMATHGoogle Scholar
  7. 7.
    Chin, F.Y.L., Santis, A.D., Ferrara, A.L., Ho, N.L., Kim, S.K.: A simple algorithm for the constrained sequence problems. Information Processing Letters 90, 175–179 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002)CrossRefGoogle Scholar
  9. 9.
    Tang, C.Y., Lu, C.L., Chang, M.D.-T., Tsai, Y.-T., Sun, Y.-J., Chao, K.-M., Chang, J.M., Chiou, Y.-H., Wu, C.-M., Chang, H.-T., Chou, W.-I.: Constrained multiple sequence alignment tool development and its applications to rnase family alignment. In: Proceeding of the 1st IEEE Computer Society Bioinformatics Conference (CSB 2002), pp. 127–137 (2002)Google Scholar
  10. 10.
    Tsai, Y.-T.: The constrained common sequence problem. Information Processing Letters 88, 173–176 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Tsai, Y.-T., Lu, C.L., Yu, C.T., Huang, Y.P.: MuSiC: A tool for multiple sequence alignment with constraint. Bioinformatics 20(14), 2309–2311 (2004)CrossRefGoogle Scholar
  12. 12.
    Walker, J.E., Saraste, M., Runswick, M.J., Gay, N.J.: Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATPrequiring enzymes and a common nucleotide binding fold. EMBO J. 1, 945–951 (1982)Google Scholar
  13. 13.
    Waterman, M.S.: Introduction to computational biology. Chapman & Hall, Boca Raton (1995)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Abdullah N. Arslan
    • 1
  1. 1.Department of Computer ScienceThe University of VermontBurlingtonUSA

Personalised recommendations