An Efficient Cooperative Method to Solve Multiple Sequence Alignment Problem

Chaabane, Lamiche

doi:10.1007/978-3-319-89743-1_17

An Efficient Cooperative Method to Solve Multiple Sequence Alignment Problem

Lamiche Chaabane ORCID: orcid.org/0000-0002-2239-1087¹⁹

Conference paper
First Online: 12 April 2018

1269 Accesses

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 522))

Abstract

In this research work, we propose a cooperative approach called simulated particle swarm optimization (SPSO) which is based on metaheuristics to find an approximate solution for the multiple sequence alignment (MSA) problem. The developed approach uses the particle swam optimization (PSO) algorithm to discover the search space globally and the simulated annealing (SA) technique to improve the population leader «gbest» quality in order to overcome local optimum problem. Simulation results on BaliBASE benchmarks have shown the potent of the proposed method to produce good quality alignments comparing to those given by other existing methods.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Multiple sequence alignment (MSA) is a crucial tool in molecular biology and genome analysis. It has been considered as one of the important tasks in bioinformatics [1]. It helps to construct a phylogenetic tree of related DNA sequences, to predict the function and structure of unknown protein sequences, by aligning with other sequences whose function and structure is already known, and to allow comparison of the structural relationships between sequences by simultaneously aligning multiple sequences and constructing connections between the elements in different sequences [2].

Discovering optimal alignment in multiple biological sequence data is known as a NP-complete problem [3]. It has been identified as a combinatorial optimization problem [4], which is solved by using an exact or approximate algorithms. These algorithms lead to exploit various genetic information to determine evolutionary relationships among living beings [3].

Recently, the trend has shifted to the use of iterative algorithms in order to tackle the MSA problem. These approaches are based on the improvement of the given initial alignment through a series of some iterations until a stopping criterion is reached. They include genetic algorithm (GA) [5], simulated annealing algorithm (SA) [6], particle swarm optimization (PSO) [7], GA-ACO algorithm [8], Ant Colony Algorithm [9] and so on. Generally, these metaheuristics are able to find nearly optimal solutions for large instances in a reasonable processing time.

In this study, we propose a hybrid approach called SPSO algorithm to solve the MSA problem. The developed model is the cooperation between both PSO algorithm and simulated annealing technique. The remainder of the paper is organized as follows: Sect. 2 presents a brief review of the researches related to the proposed framework. In Sect. 3, both PSO and SA concepts are described. In Sect. 4, our proposed SPSO algorithm is explained in detail. In Sect. 5, the simulation results are provided. Finally, the study is concluded in Sect. 6.

2 Background

A brief review of some related works in the multiple sequence alignment field using iterative methods is presented in this section. Riaz et al. [10] presented a tabu search algorithm to align multiple sequences. The framework of his work consists to implement the adaptive memory features typical of tabu searches in order to obtain multiple sequences alignment where the quality of an alignment is measured by the COFFEE objective function. In [11], the authors proposed a novel approach to multiple sequence alignment based on Particle Swarm Optimization (PSO) to improve a sequence alignment previously obtained using Clustal X.

In Ref. [12], the authors presented an approach to the MSA problem by applying genetic algorithm with a reserve selection mechanism to avoid premature convergence in GA. A better results are obtained compared with those produced by classical GA. The authors in [13] proposed an algorithm based on binary PSO algorithm to address the multiple sequence alignment problem. Simulation results using SP score measure and nine BaliBASE tests case showed that the proposed BPSO algorithm has superior performance when compared to ClustalW and SAGA algorithms.

An artificial bee colony algorithm for solving MSA problem is introduced in [14]. In Ref. [15], Cutello et al. presented an immune inspired algorithm (IMSA) to tackle the multiple sequence alignment problem using ad-hoc mutation operators. Experimental results on BALIBASE v.1.0 show that IMSA is superior to PRRP, CLUSTALX, SAGA, DIALIGN, PIMA, MULTIALIGN and PILEUP8. In [16], simulated annealing technique was applied to solve MSA problem using a set of DNA benchmarks of HIV virus genes of human and simian.

In Ref. [17], the authors proposed a hybrid algorithm using a GA and cuckoo search algorithm to improve multiple sequence alignment. The obtained results are compared with ClustalW by using five different datasets. Recently, an efficient method by using multi-objective genetic algorithm (MSAGMOGA) to discover optimal alignments is proposed in [18]. Experiments on the BAliBASE 2.0 database confirmed that MSAGMOGA obtained better results than MUSCLE, SAGA and MSA-GA methods.

3 Preliminaries

3.1 Outline of the Particle Swarm Optimization (PSO)

Particle swarm optimization (PSO) is an adapted algorithm developed the first time by Kennedy and Eberhart [19], inspired by bird flocking and fish schooling. PSO used a population of individuals called particles and two primary operators: velocity update and position update. During each generation, each particle moves toward the particles according to its best position and the global best position. In addition, a new velocity value for each particle is calculated based on its current velocity, the distance from its previous best position and the distance from the global best position. The evolution of the swarm is governed by the following equations:

$$ V^{{\left( {k + 1} \right)}} \, = \,w.V^{(k)} \, + \,c_{1} .rand_{1} . \, \left( {pbest^{(k)} \, - \,X^{(k)} } \right)\, + \,c_{2} .rand_{2} .\left( {gbest\left( {^{k} } \right) \, \, - \,X^{(k)} } \right). $$

(1)

$$ X^{{\left( {k + 1} \right)}} \, = \,X^{(k)} \, + \,V^{(k + 1)} . $$

(2)

where:

X is the position of the particle,

V is the velocity of the particle,

w is the inertia weight,

pbest is the best position of the particle,

gbest is the global best position of the swarm,

rand1, rand2 are random values between 0 and 1,

c1, c2 are positive constants which determine the impact of the personal best solution and the global best solution on the search process, respectively, k is the iteration number.

Concerning the stopping condition, generally PSO algorithm terminates when a set number of times or until a minimum error is achieved. All parameters of PSO algorithm are fixed experimentally in order to have a good compromise between the convergence time of the algorithm and the final solution quality.

3.2 Outline of the Simulated Annealing (SA)

Simulated annealing (SA) is a general probabilistic local search algorithm proposed by Kirkpatrick et al. [20] to solve difficult optimization problems. It is inspired by the annealing of solids in physics. SA models the slow cooling process of solids to achieve the minimum energy as an analogy, reaching the minimum function value. As a result, it attains an optimal/near-optimal solution by implementing an iterative cooling process from a high temperature, at which solid particles are in the liquid phase. Simulated annealing utilizes a control parameter, temperature T, for the cooling process. The solid is allowed to attain the thermal equilibrium for every T degree that has its energy E probabilistically distributed, as given in Eq. (3), where $ \text{k}_{\text{b}} $ is the Boltzmann constant.

$$ P(E)\, = \,e^{{(\frac{ - E}{{k_{b} t}})}} . $$

(3)

In the combinatorial optimization context, if we aim to find a good solution then we move from a solution to one of its neighbors in the search space according to a probabilistic criterion. If the cost decreases then the solution is retained and the move is accepted. Otherwise, the move is accepted only with a probability depending on the cost increase and the temperature parameter T [20].

4 Proposed Method

PSO performs excellently in the case of global search but it is not efficient in local search. It suffers from its weak local search ability and the local minima limit. On the other hand, SA is good in local search while less good in global search. However, it takes advantage of the acceptance of candidate solutions by the use of metropolis criteria and must escape from local optimum to search in other solution space.

In order to construct an intelligent algorithm which can be effectively avoid weaknesses and fully use the advantages of both PSO and SA algorithms, we propose a hybrid approach which combines the PSO with simulated annealing, so the new hybrid algorithm called Simulated Particle Swarm optimization (SPSO) conducts both global search and local search in every iteration. According to this hybridization manner, the probability to obtain better solutions significantly increases. At each iteration, the proposed hybrid SPSO algorithm consists of applying PSO algorithm in order to guide global search, and use SA to improve the gbest which helps PSO to escape from local optimum and increase the convergence speed of SPSO algorithm. The flowchart of the proposed SPSO is presented in Fig. 1.

4.1 PSO Components of MSA Problem

Particle Representation.

Each particle represents a potential solution to the MSA problem, effectively it corresponds to a sequence alignment. A particle is then represented as a set of vectors, where each vector specifies the positions of the gaps in each one of the sequences to be aligned [21].

Swarm Initialization.

The size of the whole swarm is determined by the user. The initial set of particles is generated by adding gaps into each sequence at random position, thus, all the sequences have the same length L in which its value is 1.2 times of the longest sequences [21].

Fitness Evaluation.

A parameter to determine which alignment will survive in the next generation is its fitness value. A formal definition of the sum-of-pairs (SPS) of multiple sequence alignment is introduced which is used as a tool to compute fitness. The score assigned to each alignment is the sum of the scores (SP) of the alignment of each pair of sequences. The score of each pair of sequences is the sum of the score assigned to the match of each pair of symbols, which is given by the substitution matrix. The score of a multiple alignment is given as follows:

$$ Score(A)\, = \,\sum\limits_{i = 1}^{k - 1} {\sum\limits_{j = i + 1}^{k} {S(A_{i} ,\,A_{j} )} } . $$

(4)

where the S(A_i, A_j) is the alignment score between two given sequences A_i and A_j.

Particle Move.

In the PSO algorithm, each particle moves towards the leader at a speed proportional to the distance between the particle and the leader. In this paper, this distance will be measured as the proportion of gaps that do not match in the sequences, according to the formula:

$$ Distance\, = \,\frac{no\,matching\,\,gaps}{total\,\,gaps}. $$

(5)

To move particles towards the leader, an operator similar to the crossover operator from genetic algorithms is used [21]. It consists to select a crossover point which divides the alignment into two segments, and then a segment of the particle is replaced with a segment from the leader. This replacement is achieved removing from the particle the gaps that are in the segment, and then adding the gaps from the leader’s segment.

4.2 SA Components of MSA Problem

According to the basic elements of simulated annealing cited above, components of our proposed algorithm for MSA problem are as follows:

Cost Function.

The cost function is used to evaluate the quality of each alignment in the swarm. In order to retain the same paradigm to measure the quality of the solution, we decide in this study to use the sum-of-pairs (SPS) as a cost function which is the same one of that used in the PSO algorithm.

Initial Solution.

The generation of an initial solution is an important step towards getting a final improved alignment. In our developed method, the initial solution is constructed by insertion of gaps randomly in each sequence of the alignment.

Generation of Neighbors.

In the multiple sequence alignment context, neighbors of a current solution are obtained by employing some perturbations to the gaps positions into the different sequences. In this work, we choose to apply a simple efficient strategy to change positions of these gaps, this mechanism is called LocalShuffle operator, its main idea is as follows: firstly, picks a random amino acid from a randomly chosen sequence in the alignment and checks whether one of its neighbors is a gap. If this is the case, the algorithm swaps the selected amino acid with a gap neighbor. If both neighbors are gaps then one of them is picked randomly [22].

Choice of Cooling Schedule.

An effective cooling schedule is essential to reducing the amount of time required by the algorithm to find an optimal solution. Several temperature decreasing schemes have been proposed in the literature, they include static schedules and adaptive schedules [16]. Here, we propose to use the most common cooling function which is defined by T_k+1 = α.T_k. This function decreases the temperature value by a α factor, where α ∈ [0.70, 1.0[. The pseudo-code of our SA is given below.

After the description of the key components of both PSO and SA algorithms, the pseudo-code of our SPSO procedure is summarized as follows:

5 Simulation and Results

The proposed approach is implemented in Java language. All tests have been fulfilled on a PC with an 2.66 GHz Intel Pentium IV processor and 4 GB RAM. We conducted some experiments in order to demonstrate the effectiveness of our SPSO algorithm. For that, a set of benchmark sequences with different identities and lengths are chosen from the BAliBase 1.0 library [23]. Characteristics of the used data and all parameters setting of the algorithm SPSO are summarized in Tables 1 and 2 respectively.

Table 1. Characteristics of benchmark sequences.

Full size table

Table 2. Parameter settings for experiments.

Full size table

All obtained results by GA, PSO, ABC and our SPSO algorithm using the best, the worst and the average SPS values of 10 experiments are summarized in Tables 3, 4 and 5 respectively.

Table 3. Comparison results for short sequences.

Full size table

Table 4. Comparison results for medium sequences.

Full size table

Table 5. Comparison results for long sequences

Full size table

The results plotted in Tables 3, 4 and 5 show clearly the considerable improvement of scores by the proposed SPSO approach. Indeed, it produces alignments of much better quality than that of the other cited methods in both specified categories.

In order to further assess the potent of our SPSO algorithm, a second experiment using another datasets is performed, where the goal is to compare our SPSO approach with TLPSO-MSA [24] technique. In this experiment, three different sets of protein coming from the BaliBASE 3.0 database [25] are selected. The average SP and TC scores [26] are portrayed in Table 6.

Table 6. Comparison results on the selected test cases.

Full size table

From Table 6, it can be seen that our SPSO outperforms clearly TLPSO-MSA method in terms of TC score for all cases. In terms of SP score, it finds results better than TSPSO-MSA for RV11 and RV20 protein families and produces a competitive results on RV12 dataset.

6 Conclusion

In this work, we contributed to the ongoing research by proposing a hybrid model for finding optimized alignments to the MSA problem. The developed approach uses the characteristics of the random search, global convergence of the PSO and the potent of the simulated annealing to update the global optimal solution. The performance of the proposed SPSO algorithm is judged on a set of BaliBase benchmark problems and is favorably compared with other algorithms in the literature. The results demonstrate that the proposed approach is overall more effective to find better alignments in a reasonable processing time.

In the future, we will revise our score function to make this score more realistic, also, we can use another intelligent heuristic to generate the initial whole swarm in order to increase the convergence speed of the algorithm. In addition, we can incorporate other efficient mechanisms for the neighborhood structure to improve the global solution quality. A comparison of the proposed method with some other aligners such as Clustal W, SAGA or MULTALIGN is possible to verify its effectiveness.

References

Thompson, J.D., Thierry, J.C., Poch, O.: RASCAL: rapid scanning and correction of multiple sequence alignments. Bioinformatics 19(9), 1155–1161 (2003)
Article Google Scholar
Jiang, T., Wang, L.: On the complexity of multiple sequence alignment. J. Comput. Biol. 1, 337–378 (1994)
Article Google Scholar
Bonizzoni, P., Della Vedova, G.: The complexity of multiple sequence alignment with SP-score that is a metric. Theor. Comput. Sci. 259, 63–79 (2001)
Article MathSciNet Google Scholar
Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Dover Publications, New York (1998)
MATH Google Scholar
Horng, J.T., Wu, L.C., Lin, C.M., Yang, B.H.: A Genetic algorithm for multiple sequence alignment. Soft Comput. 9, 407–420 (2005)
Article Google Scholar
Hernández-Guía, M., Mulet, R., Rodríguez-Pérez, S.: A new simulated annealing algorithm for the multiple sequence alignment problem. The approach of polymers in a random media. Phys. Rev. E 72, 1–7 (2005)
Article Google Scholar
Lei, C.W., Ruan, J.H.: A particle swarm optimization algorithm for finding DNA sequence motifs. In: Proceedings IEEE, pp. 166–173 (2008)
Google Scholar
Lee, Z.J., Su, S.F., Chuang, C.C., Liu, K.H.: Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl. Soft Comput. 8, 55–78 (2008)
Article Google Scholar
Chen, L., Zou, L., Chen, J.: An efficient ant colony algorithm for multiple sequences alignment. In: Proceedings of the 3rd International Conference on Natural Computation (ICNC 2007), pp. 208–212 (2007)
Google Scholar
Riaz, T., Wang, Y., Li, K.B.: Multiple sequence alignment using tabu search. In: Proceedings of 2nd Asia-Pacific Bioinformatics Conference (APBC), Dunedin, New Zealand, pp. 223–232 (2004)
Google Scholar
Xu, F., Chen, Y.: A method for multiple sequence alignment based on particle swarm optimization. In: Huang, D.-S., Jo, K.-H., Lee, H.-H., Kang, H.-J., Bevilacqua, V. (eds.) ICIC 2009. LNCS (LNAI), vol. 5755, pp. 965–973. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04020-7_104
Chapter Google Scholar
Chen, Y., Hu, J., Hirasawa, K., Yu, S.: Multiple sequence alignment based on genetic algorithm with reserve selection. In: Proceedings of International Conference on Networking, Sensing and Control (ICNSC), pp. 1511–1516 (2008)
Google Scholar
Long, H.X., Xu, W.B., Sun, J., Ji, W.J.: Multiple sequence alignment based on a binary particle swarm optimization algorithm. In: Proceedings of Fifth International Conference on Natural Computation, pp. 265–269 (2009)
Google Scholar
Lei, X., Sun, J., Xu, X., Guo, L.: Artificial bee colony algorithm for solving multiple sequence alignment. In: Proceedings of 2010 IEEE Fifth International Conference on BIC-TA, pp. 337–342 (2010)
Google Scholar
Cutello, V., Nicosia, G., Pavone, M., Prizzi, I.: Protein multiple sequence alignment by hybrid bio-inspired algorithms. Nucleic Acids Res. 39(6), 1980–1992 (2011)
Article Google Scholar
Liñán-García, E., Gallegos-Araiza, L.M.: Simulated annealing with previous solutions applied to DNA sequence alignment. ISRN Artif. Intell. 2012, 1–6 (2012)
Article Google Scholar
Abu-Srhan, A., Al Daoud, E.: A hybrid algorithm using a genetic algorithm and cuckoo search algorithm to solve the traveling salesman problem and its application to multiple sequence alignment. Int. J. Adv. Sci. Technol. 61, 29–38 (2013)
Article Google Scholar
Kayaa, M., Sarhanb, A., Alhajjb, R.: Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. Comput. Meth. Programs Biomed. 114, 38–49 (2014)
Article Google Scholar
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, Perth, vol. 4, pp. 1942–1948, (1995)
Google Scholar
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983)
Article MathSciNet Google Scholar
Rodriguez, P.F., Nino, L.F., Alonso, O.M.: Multiple sequence alignment using swarm intelligence. Int. J. Comput. Intell. Res. 3(2), 123–130 (2007)
Google Scholar
Lamiche, C.: An effective alignment technique to solve MSA problem. Asian J. Math. Comput. 17(2), 2395–4213 (2017)
Google Scholar
Thompson, J.D., Plewniak, F., Poch, O.: BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87–88 (1999)
Article Google Scholar
Lalwani, S., Kumar, R., Gupta, N.: A novel two-level particle swarm optimization approach for efficient multiple sequence alignment. Memet. Comput. 7(2), 119–133 (2015)
Article Google Scholar
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61(1), 127–136 (2005)
Article Google Scholar
Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 27(13), 2682–2690 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Mohamed Boudiaf University, M’sila, Algeria
Lamiche Chaabane

Authors

Lamiche Chaabane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lamiche Chaabane .

Editor information

Editors and Affiliations

University of Saida, Saida, Algeria
Abdelmalek Amine
University of Regina, Regina, Saskatchewan, Canada
Malek Mouhoub
Concordia University, Montreal, Québec, Canada
Otmane Ait Mohamed
University of Oran, Oran, Algeria
Bachir Djebbar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaabane, L. (2018). An Efficient Cooperative Method to Solve Multiple Sequence Alignment Problem. In: Amine, A., Mouhoub, M., Ait Mohamed, O., Djebbar, B. (eds) Computational Intelligence and Its Applications. CIIA 2018. IFIP Advances in Information and Communication Technology, vol 522. Springer, Cham. https://doi.org/10.1007/978-3-319-89743-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-89743-1_17
Published: 12 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89742-4
Online ISBN: 978-3-319-89743-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Abstract

1 Introduction

2 Background

3 Preliminaries

3.1 Outline of the Particle Swarm Optimization (PSO)

3.2 Outline of the Simulated Annealing (SA)

4 Proposed Method

4.1 PSO Components of MSA Problem

Particle Representation.

Swarm Initialization.

Fitness Evaluation.

Particle Move.

4.2 SA Components of MSA Problem

Cost Function.

Initial Solution.

Generation of Neighbors.

Choice of Cooling Schedule.

5 Simulation and Results

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation