Advertisement

Journal of Combinatorial Optimization

, Volume 32, Issue 1, pp 79–94 | Cite as

Constrained pairwise and center-star sequences alignment problems

  • Yong Zhang
  • Joseph Wun-Tat Chan
  • Francis Y. L. Chin
  • Hing-Fung Ting
  • Deshi Ye
  • Feng Zhang
  • Jianyu Shi
Article
  • 138 Downloads

Abstract

Sequence alignment is a fundamental problem in computational biology, which is also important in theoretical computer science. In this paper, we consider the problem of aligning a set of sequences subject to a given constrained sequence. Given two sequences \(A=a_1a_2\ldots a_n\) and \(B=b_1b_2\ldots b_n\) with a given distance function and a constrained sequence \(C=c_1c_2\ldots c_k\), our goal is to find the optimal sequence alignment of A and B w.r.t. the constraint C. We investigate several variants of this problem. If \(C=c^k\), i.e., all characters in C are same, the optimal constrained pairwise sequence alignment can be solved in \(O(\min \{kn^2,(t-k)n^2\})\) time, where t is the minimum number of occurrences of character c in A and B. If in the final alignment, the alignment score between any two consecutive constrained characters is upper bounded by some value, which is called GB-CPSA, we give a dynamic programming with the time complexity \(O(kn^4/\log n)\). For the constrained center-star sequence alignment (CCSA), we prove that it is NP-hard to achieve the optimal alignment even over the binary alphabet. Furthermore, we show a negative result for CCSA, i.e., there is no polynomial-time algorithm to approximate the CCSA within any constant ratio.

Keywords

Sequence alignment Dynamic programming Complexity 

Notes

Acknowledgments

The authors thank the anonymous referees for their helpful comments to improve the presentation of this paper. This work was supported by NSFC (61433012, U1435215, 11171086), HK RGC Grant (HKU 7114/13E, HKU 7164/12E, HKU 7111/12E), HKU small project funding 201309176064, Natural Science Foundation of Hebei A2013201218, Chinese Academy of Sciences research Grant (No. KGZD-EW-103-5(9)), Fundamental Research Foundation of Northwestern Polytechnical University in China (Grant No. JC201164), Fundamental Research Funds for the Central Universities (Grant No. 3102015ZY081), and China Postdoctoral Science Foundation (Grant No. 2012M521803).

References

  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410CrossRefGoogle Scholar
  2. Bonizzoni P, Vedova GD (2001) The complexity of multiple sequence alignment with sp-score that is a metric. Theor Comput Sci 259(1–2):63–79MathSciNetCrossRefMATHGoogle Scholar
  3. Chin FYL, Santis AD, Ferrara AL, Ho NL, Kim SK (2004) A simple algorithm for the constrained sequence problems. Inf Process Lett 90:175–179MathSciNetCrossRefMATHGoogle Scholar
  4. Chin FYL, Ho NL, Lam TW, Wong PWH (2005) Efficient constrained multiple sequence alignment with performance guarantee. J Bioinform Comput Biol 3(1):1–18CrossRefGoogle Scholar
  5. Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, CambridgeMATHGoogle Scholar
  6. Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, San FranciscoMATHGoogle Scholar
  7. Gusfield D (1993) Efficient methods for multiple sequence alignment with guaranteed error bounds. Bul Math Biol 55:141–154CrossRefMATHGoogle Scholar
  8. Iliopoulos CS, Rahman MS (2008) Algorithms for computing variants of the longest common subsequence problem. Theor Comput Sci 395(2–3):255–267MathSciNetCrossRefMATHGoogle Scholar
  9. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) ClustalW and ClustalX version 2. Bioinformatics 23(21):2947–2948CrossRefGoogle Scholar
  10. Masek WJ, Paterson MS (1980) A faster algorithm computing string edit distances. J Comput Syst Sci 20(1):18–31MathSciNetCrossRefMATHGoogle Scholar
  11. Mount DM (2004) Bioinformatics: sequence and genome analysis, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring HarborGoogle Scholar
  12. Setubal J, Meidanis J (1997) Introduction to computational molecular biology (Chap. 3). PWS Publishing Company, BostonGoogle Scholar
  13. Tang CY, Lu CL, Chang MD-T, Tsai Y-T, Sun Y-J, Chao K-M, Chang J-M, Chiou Y-H, Wu C-M, Chang H-T, Chou W-I (2003) Constrained multiple sequence alignment tool development and its application to rnase family alignment. J Bioinform Comput Biol 1(2):267–287CrossRefGoogle Scholar
  14. Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1(4):337–348CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Yong Zhang
    • 1
    • 3
  • Joseph Wun-Tat Chan
    • 2
  • Francis Y. L. Chin
    • 3
  • Hing-Fung Ting
    • 3
  • Deshi Ye
    • 4
  • Feng Zhang
    • 5
  • Jianyu Shi
    • 6
  1. 1.Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
  2. 2.College of International EducationHong Kong Baptist UniversityKowloonHong Kong, China
  3. 3.Department of Computer ScienceThe University of Hong KongPok Fu LamHong Kong, China
  4. 4.College of Computer ScienceZhejiang UniversityHangzhouChina
  5. 5.College of Mathematics and Information ScienceHebei UniversityBaodingChina
  6. 6.School of Life ScienceNorthwestern Polytechnical UniversityXi’anChina

Personalised recommendations