# Constrained pairwise and center-star sequences alignment problems

- 130 Downloads

## Abstract

Sequence alignment is a fundamental problem in computational biology, which is also important in theoretical computer science. In this paper, we consider the problem of aligning a set of sequences subject to a given constrained sequence. Given two sequences \(A=a_1a_2\ldots a_n\) and \(B=b_1b_2\ldots b_n\) with a given distance function and a constrained sequence \(C=c_1c_2\ldots c_k\), our goal is to find the optimal sequence alignment of *A* and *B* w.r.t. the constraint *C*. We investigate several variants of this problem. If \(C=c^k\), i.e., all characters in *C* are same, the optimal constrained pairwise sequence alignment can be solved in \(O(\min \{kn^2,(t-k)n^2\})\) time, where *t* is the minimum number of occurrences of character *c* in *A* and *B*. If in the final alignment, the alignment score between any two consecutive constrained characters is upper bounded by some value, which is called GB-CPSA, we give a dynamic programming with the time complexity \(O(kn^4/\log n)\). For the constrained center-star sequence alignment (CCSA), we prove that it is NP-hard to achieve the optimal alignment even over the binary alphabet. Furthermore, we show a negative result for CCSA, i.e., there is no polynomial-time algorithm to approximate the CCSA within any constant ratio.

## Keywords

Sequence alignment Dynamic programming Complexity## Notes

### Acknowledgments

The authors thank the anonymous referees for their helpful comments to improve the presentation of this paper. This work was supported by NSFC (61433012, U1435215, 11171086), HK RGC Grant (HKU 7114/13E, HKU 7164/12E, HKU 7111/12E), HKU small project funding 201309176064, Natural Science Foundation of Hebei A2013201218, Chinese Academy of Sciences research Grant (No. KGZD-EW-103-5(9)), Fundamental Research Foundation of Northwestern Polytechnical University in China (Grant No. JC201164), Fundamental Research Funds for the Central Universities (Grant No. 3102015ZY081), and China Postdoctoral Science Foundation (Grant No. 2012M521803).

## References

- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410CrossRefGoogle Scholar
- Bonizzoni P, Vedova GD (2001) The complexity of multiple sequence alignment with sp-score that is a metric. Theor Comput Sci 259(1–2):63–79MathSciNetCrossRefMATHGoogle Scholar
- Chin FYL, Santis AD, Ferrara AL, Ho NL, Kim SK (2004) A simple algorithm for the constrained sequence problems. Inf Process Lett 90:175–179MathSciNetCrossRefMATHGoogle Scholar
- Chin FYL, Ho NL, Lam TW, Wong PWH (2005) Efficient constrained multiple sequence alignment with performance guarantee. J Bioinform Comput Biol 3(1):1–18CrossRefGoogle Scholar
- Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, CambridgeMATHGoogle Scholar
- Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, San FranciscoMATHGoogle Scholar
- Gusfield D (1993) Efficient methods for multiple sequence alignment with guaranteed error bounds. Bul Math Biol 55:141–154CrossRefMATHGoogle Scholar
- Iliopoulos CS, Rahman MS (2008) Algorithms for computing variants of the longest common subsequence problem. Theor Comput Sci 395(2–3):255–267MathSciNetCrossRefMATHGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) ClustalW and ClustalX version 2. Bioinformatics 23(21):2947–2948CrossRefGoogle Scholar
- Masek WJ, Paterson MS (1980) A faster algorithm computing string edit distances. J Comput Syst Sci 20(1):18–31MathSciNetCrossRefMATHGoogle Scholar
- Mount DM (2004) Bioinformatics: sequence and genome analysis, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring HarborGoogle Scholar
- Setubal J, Meidanis J (1997) Introduction to computational molecular biology (Chap. 3). PWS Publishing Company, BostonGoogle Scholar
- Tang CY, Lu CL, Chang MD-T, Tsai Y-T, Sun Y-J, Chao K-M, Chang J-M, Chiou Y-H, Wu C-M, Chang H-T, Chou W-I (2003) Constrained multiple sequence alignment tool development and its application to rnase family alignment. J Bioinform Comput Biol 1(2):267–287CrossRefGoogle Scholar
- Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1(4):337–348CrossRefGoogle Scholar