# Chemical reaction optimization for solving longest common subsequence problem for multiple string

## Abstract

Longest common subsequence (LCS) is a well-known NP-hard optimization problem that finds out the longest subsequence of each member of a given set of strings. In computational biology, sequence alignment is a fundamental technique to measure the similarity of biological sequences, such as DNA and genome sequences. A high sequence similarity often applied to molecular structural as well as functional similarities and can be used to determine whether (and how) sequences are related. Finding the longest common subsequence (LCS) is one way to measure the similarity of sequences. It has also applications in data compression, FPGA circuit minimization, and bioinformatics, etc. Exact algorithms are impractical since they fail to solve this problem for multiple instances of long lengths in polynomial time. There are some approximations, heuristic, and metaheuristic methods proposed to solve the problem. Chemical reaction optimization (CRO) is a new metaheuristic method that mimics the nature of chemical reaction into optimization problems. In this paper, we have proposed chemical reaction optimization technique to solve the longest common subsequence problem for multiple instances. Here, we have redesigned four elementary operators of CRO for LCS problem. Operators of CRO algorithm are used to explore the search space both locally and globally. A novel correction method has been designed to correct the solution. Correction method works after each search operator to ensure the validity of the changes made by operators. Both solution quality and execution time are considered while designing the operators and the correction method. Thus proposed system brings robustness, efficiency, and effectiveness while solving MLCS problem. Our approach is compared with hyper-heuristic, ant colony optimization, beam ant colony optimization, and memory-bound anytime algorithms. The experimental results in lengths of the returned common sequences show that our proposed algorithm gives either same or better results than all other algorithms in less execution time.

## Keywords

Algorithm Chemical reaction optimization Longest common subsequence NP-hard Optimization## Notes

## Compliance with Ethical Standards

## Conflict of interest

The authors declare that they have no conflict of interest.

## Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

## References

- Aho AV, Hopcroft JE, Ullman JD (1983) Data structures and algorithms. Addison Wesley Publishing Company, INc., BostonMATHGoogle Scholar
- Aine S, Chakrabarti P, Kumar R (2007) Awa-a window constrained anytime heuristic search algorithm. In: IJCAI, pp 2250-2255Google Scholar
- Banerjee A, Ghosh J (2001) Clickstream clustering using weighted longest common subsequences. In: Proceedings of the web mining workshop at the 1st SIAM conference on data mining, vol. 143, p 144Google Scholar
- Bepery C, Abdullah-Al-Mamun S, Rahman MS (2015) Computing a longest common subsequence for multiple sequences. In: 2015 2nd international conference on electrical information and communication technology (EICT). IEEE, pp 118-129Google Scholar
- Blum C (2010) Beam-ACO for the longest common subsequence problem. In: 2010 IEEE congress on evolutionary computation (CEC). IEEE, pp. 1-8Google Scholar
- Blum C, Blesa M (2007) Probabilistic beam search for the longest common subsequence problem. Engineering stochastic local search algorithms. Designing, implementing and analyzing effective heuristics, pp 150–161Google Scholar
- Blum C, Blesa MJ (2017) A hybrid evolutionary algorithm based on solution merging for the longest arc-preserving common subsequence problem. arXiv preprint arXiv:1702.00318
- Blum C, Blesa MJ (2018) Hybrid techniques based on solving reduced problem instances for a longest common subsequence problem. Appl Soft Comput 62:15–28CrossRefGoogle Scholar
- Blum C, Blesa M, Lopez M (2009) Beam search for the longest common subsequence problem. Comput Oper Res 36:3178–3186MathSciNetCrossRefMATHGoogle Scholar
- Blum C, Blesa MJ, Calvo B (2013) Beam-ACO for the repetition-free longest common subsequence problem. In: International conference on artificial evolution (Evolution Artificielle). Springer, pp 79–90Google Scholar
- Bonizzoni P, Della Vedova G, Mauri G (2001) Experimenting an approximation algorithm for the LCS. Discrete Appl Math 110(1):13–24MathSciNetCrossRefGoogle Scholar
- Brisk P, Kaplan A, Sarrafzadeh M (2004) Area-efficient instruction set synthesis for reconfigurable system-on-chip designs. In: Proceedings of the 41st annual design automation conference. ACM, pp 395–400Google Scholar
- Chen Y, Wan A, Liu W (2006) A fast parallel algorithm for finding the longest common sequence of multiple biosequences. BMC Bioinform 7(4):S4CrossRefGoogle Scholar
- Chin F, Poon CK (1994) Performance analysis of some simple heuristics for computing longest common subsequences. Algorithmica 12(4–5):293–311MathSciNetCrossRefMATHGoogle Scholar
- Easton T, Singireddy A (2007) A specialized branching and fathoming technique for the longest common subsequence problem. Int J Oper Res 4(2):98–104MathSciNetMATHGoogle Scholar
- Easton T, Singireddy A (2008) A large neighborhood search heuristic for the longest common subsequence problem. J Heuristics 14(3):271–283CrossRefMATHGoogle Scholar
- Eppstein D, Galil Z, Giancarlo R, Italiano GF (1992) Sparse dynamic programming ii: convex and concave cost functions. J ACM (JACM) 39(3):546–567MathSciNetCrossRefMATHGoogle Scholar
- Guénoche A (2004) Supersequences of masks for oligo-chips. J Bioinform Comput Biol 2(03):459–469CrossRefGoogle Scholar
- Guenoche A, Vitte P (1995) Longest common subsequence to multiple strings. Exact and approximate algorithms. TSI-Technique et Science Informatiques-RAIRO 14(7):897–916Google Scholar
- Hakata K, Imai H (1992) The longest common subsequence problem for small alphabet size between many strings. Algorithms Comput 650:469–478MathSciNetGoogle Scholar
- Hirschberg DS (1975) A linear space algorithm for computing maximal common subsequences. Commun ACM 18(6):341–343MathSciNetCrossRefMATHGoogle Scholar
- Ho Wc (2017) A fast algorithm for the constrained longest common subsequence problem with small alphabet. Proceedings of the 34th workshop on combinatorial mathematics and computation theory, Taichung, Taiwan, May 19–20, 2017Google Scholar
- Hsu W, Du M (1984) Computing a longest common subsequence for a set of strings. BIT Numer Math 24(1):45–59MathSciNetCrossRefMATHGoogle Scholar
- Huang K, Yang CB, Tseng KT, et al (2004) Fast algorithms for finding the common subsequence of multiple sequences. In: Proceedings of the international computer symposium. IEEE Press, pp 1006–1011Google Scholar
- Irving RW, Fraser CB (1992) Two algorithms for the longest common subsequence of three (or more) strings. In: Annual symposium on combinatorial pattern matching. Springer, pp 214–229Google Scholar
- Islam MR, Asha ZT, Ahmed R (2015) Longest common subsequence using chemical reaction optimization. In: 2015 2nd international conference on electrical information and communication technology (EICT). IEEE, pp 29–33Google Scholar
- James J, Lam AY, Li VO (2011) Evolutionary artificial neural network based on chemical reaction optimization. In: 2011 IEEE congress on evolutionary computation (CEC). IEEE, pp 2083–2090Google Scholar
- Jansen T, Weyland D (2007) Analysis of evolutionary algorithms for the longest common subsequence problem. In: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM, pp 939–946Google Scholar
- Jiang T, Li M (1995) On the approximation of shortest common supersequences and longest common subsequences. SIAM J Comput 24(5):1122–1139MathSciNetCrossRefMATHGoogle Scholar
- Johtela T, Smed J, Hakonen H, Raita T (1996) An efficient heuristic for the LCS problem. In: Third South American workshop on string processing, WSP 96:126–140Google Scholar
- Korkin D, Wang Q, Shang Y (2008) An efficient parallel algorithm for the multiple longest common subsequence (MLCS) problem. In: 37th international conference on parallel processing, 2008, ICPP’08. IEEE, pp 354–363Google Scholar
- Lam AY, Li VO (2012) Chemical reaction optimization: a tutorial. Memet Comput 4(1):3–17CrossRefGoogle Scholar
- Likhachev M, Gordon GJ, Thrun S (2004) Ara*: Anytime a* with provable bounds on sub-optimality. In: Advances in neural information processing systems, pp 767–774Google Scholar
- Likhachev M, Ferguson D, Gordon G, Stentz A, Thrun S (2008) Anytime search in dynamic graphs. Artif Intell 172(14):1613–1643MathSciNetCrossRefMATHGoogle Scholar
- Li Y, Li H, Duan T, Wang S, Wang Z, Cheng Y (2016) A real linear and parallel multiple longest common subsequences (MLCS) algorithm. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1725–1734Google Scholar
- López-Ibánez M, Dubois-Lacoste J, Stützle T, Birattari M (2011) The irace package: iterated racing for automatic algorithm configuration. IRIDIA, Universite Libre de Bruxelles, Brussels, Belgium, Technical Report TR/IRIDIA/2011-004Google Scholar
- Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM (JACM) 25(2):322–336MathSciNetCrossRefMATHGoogle Scholar
- Mousavi SR, Tabataba FS (2012) An improved algorithm for the longest common subsequence problem. Comput Oper Res 39:512–520MathSciNetCrossRefMATHGoogle Scholar
- Ning K (2010) Deposition and extension approach to find longest common subsequence for thousands of long sequences. Comput Biol Chem 34(3):149–157MathSciNetCrossRefMATHGoogle Scholar
- Peng Z, Wang Y (2017) A novel efficient graph model for the multiple longest common subsequences (MLCS) problem. Front Genet 8:104CrossRefGoogle Scholar
- Saifullah CK, Islam MR (2016a) Chemical reaction optimization for solving shortest common supersequence problem. Comput Biol Chem 64:82–93CrossRefGoogle Scholar
- Saifullah CK, Islam MR (2016b) Solving shortest common supersequence problem using chemical reaction optimization. In: 2016 5th International conference on informatics, electronics and vision (ICIEV). IEEE, pp 50–55Google Scholar
- Saifullah CK, Islam MR, Mahmud MR (2018) Chemical reaction optimization algorithm for word detection using pictorial structure. In: International conference on emerging technology in data mining and information security (IEMIS) (Accepted. To appear)Google Scholar
- Sankoff D, Kruskal JB, (1983) Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. In: Sankoff D, Kruskal JB (eds) Reading: Addison-Wesley Publication (1983)Google Scholar
- Sellis TK (1988) Multiple-query optimization. ACM Trans Database Syst (TODS) 13(1):23–52CrossRefGoogle Scholar
- Shyu SJ, Tsai CY (2009) Finding the longest common subsequence for multiple biological sequences by ant colony optimization. Comput Oper Res 36(1):73–91MathSciNetCrossRefMATHGoogle Scholar
- Singireddy A (2003) Solving the longest common subsequence problem in bioinformatics. Master, Kansas State University 1(1):1–10Google Scholar
- Sivanandam S, Deepa S (2007) Introduction to genetic algorithms. Springer, BerlinMATHGoogle Scholar
- Storer J (1988) Data compression. Elsevier, AmsterdamGoogle Scholar
- Tabataba FS, Mousavi SR (2012) A hyper-heuristic for the longest common subsequence problem. Comput Biol Chem 36:42–54MathSciNetCrossRefMATHGoogle Scholar
- Truong TK, Li K, Xu Y (2013) Chemical reaction optimization with greedy strategy for the 0–1 knapsack problem. Appl Soft Comput 13(4):1774–1780CrossRefGoogle Scholar
- Tsai Y, Hsu J (2002) An approximation algorithm for multiple longest common subsequence problems. In: Proceeding of the 6th world multiconference on systemics, cybernetics and informatics, SCI, pp 456–460Google Scholar
- Tseng KT, Chan DS, Yang CB, Lo SF (2018) Efficient merged longest common subsequence algorithms for similar sequences. Theor Comput Sci 708:75–90MathSciNetCrossRefMATHGoogle Scholar
- Vadlamudi SG, Aine S, Chakrabarti PP (2011) A memory-bounded anytime heuristic-search algorithm. IEEE Trans Syst Man Cybern Part B (Cybern) 41(3):725–735CrossRefGoogle Scholar
- Van Den Berg J, Shah R, Huang A, Goldberg K (2011) Ana: anytime nonparametric a. In: Proceedings of twenty-fifth AAAI conference on artificial intelligence (AAAI-11)Google Scholar
- Wang Q, Korkin D, Shang Y (2009) Efficient dominant point algorithms for the multiple longest common subsequence (mlcs) problem. In: IJCAI, pp 1494–1500Google Scholar
- Wang Q, Pan M, Shang Y, Korkin D (2010) A fast heuristic search algorithm for finding the longest common subsequence of multiple strings. In: AAAIGoogle Scholar
- Wang Q, Korkin D, Shang Y (2011) A fast multiple longest common subsequence (MLCS) algorithm. IEEE Trans Knowl Data Eng 23(3):321–334CrossRefGoogle Scholar
- Wang X, Wu Y, Zhu D (2016) A polynomial time algorithm for a generalized longest common subsequence problem. In: Green, pervasive, and cloud computing. Springer, pp 18–29Google Scholar
- Xu J, Lam AY, Li VO (2010) Parallel chemical reaction optimization for the quadratic assignment problem. In: World congress in computer science, computer engineering, and applied computing, Worldcomp 2010Google Scholar
- Xu J, Lam AY, Li VO (2011) Chemical reaction optimization for task scheduling in grid computing. IEEE Trans Parallel Distrib Syst 22(10):1624–1631CrossRefGoogle Scholar
- Yang J, Xu Y, Shang Y, Chen G (2014) A space-bounded anytime algorithm for the multiple longest common subsequence problem. IEEE Tans Knowl Data Eng 26(11):2599–2609CrossRefGoogle Scholar
- Yao X (1991) Optimization by genetic annealing. In: Proceedings of the second australian conference on neural networks, pp 94–97Google Scholar
- Zhu D, Wang X (2016) A fast algorithm for solving a generalized longest common subsequence problem. ICSIC 2016 Committees Executive Committee , p 1Google Scholar