Evaluating Multiple Sequence Alignments Using a LS-SVM Approach with a Heterogeneous Set of Biological Features

Ortuño, Francisco; Valenzuela, Olga; Pomares, Héctor; Rojas, Ignacio

doi:10.1007/978-3-642-38682-4_18

Francisco Ortuño¹⁹,
Olga Valenzuela²⁰,
Héctor Pomares¹⁹ &
…
Ignacio Rojas¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7903))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

2341 Accesses
1 Citations

Abstract

Multiple sequence alignment (MSA) is an essential approach to apply in other outstanding bioinformatics tasks such as structural predictions, biological function analyses or phylogenetic modeling. However, current MSA methodologies do not reach a consensus about how sequences must be accurately aligned. Moreover, these tools usually provide partially optimal alignments, as each one is focused on specific features. Thus, the same set of sequences can provide quite different alignments, overall when sequences are less related. Consequently, researchers and biologists do not agree on how the quality of MSAs should be evaluated in order to decide the most adequate methodology. Therefore, recent evaluations tend to use more complex scores including supplementary biological features. In this work, we address the evaluation of MSAs by using a novel supervised learning approach based on Least Square Support Vector Machine (LS-SVM). This algorithm will include a set of heterogeneous features and scores in order to determine the alignment accuracies. It is assessed by means of the benchmark BAliBASE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H.Z., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.L.: Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)
Article Google Scholar
Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., Bourne, P.: The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000)
Article Google Scholar
Bradley, R.K., Roberts, A., Smoot, M., Juvekar, S., Do, J., Dewey, C., Holmes, I., Pachter, L.: Fast Statistical Alignment. PLoS Computational Biology 5(5) (2009)
Google Scholar
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res. 32, D262–D266 (2004)
Article Google Scholar
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5(3), 345–352 (1979)
Google Scholar
De Brabanter, K., Karsmakers, P., Ojeda, F., Alzate, C., De Brabanter, J., Pelckmans, K., De Moor, B., Vandewalle, J., Suykens, J.A.K.: Ls-svmlab: a matlab toolbox for least squares support vector machines, v1.8 (2011)
Google Scholar
Do, C., Mahabhashyam, M., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15(2), 330–340 (2005)
Article Google Scholar
Edgar, R.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)
Article Google Scholar
Estévez, P.A., Tesmer, M., Perez, C.A., Zurada, J.M.: Normalized mutual information feature selection. IEEE Transactions on Neural Networks 20(2), 189–201 (2009)
Article Google Scholar
Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L.L., Eddy, S.R., Bateman, A.: The pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010)
Article Google Scholar
Henikoff, S., Henikoff, J.G.: Amino-acid substitution matrices from protein blocks. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 89(22), pp. 10915–10919 (1992)
Google Scholar
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30(14), 3059–3066 (2002)
Article Google Scholar
Kemena, C., Notredame, C.: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25(19), 2455–2465 (2009)
Article Google Scholar
Kemena, C., Taly, J.F., Kleinjung, J., Notredame, C.: Strike: evaluation of protein msas using a single 3d structure. Bioinformatics 27(24), 3385–3391 (2011)
Article Google Scholar
Lassmann, T., Sonnhammer, E.: Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6 (2005)
Google Scholar
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11(5), 473–483 (2010)
Article Google Scholar
Lin, K., Kleinjung, J., Taylor, W.R., Heringa, J.: Testing homology with contact accepted mutation (cao): a contact-based markov model of protein evolution. Computational Biology and Chemistry 27(2), 93–102 (2003)
Article MATH Google Scholar
Notredame, C., Higgins, D., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302(1), 205–217 (2000)
Article Google Scholar
O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D., Notredame, C.: 3DCoffee: Combining protein sequences and structures within multiple sequence alignments. Journal of Molecular Biology 340(2), 385–395 (2004)
Article Google Scholar
Pei, J.: Multiple protein sequence alignment. Current Opinion in Structural Biology 18(3), 382–386 (2008)
Article Google Scholar
Pei, J., Grishin, N.V.: PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808 (2007)
Article Google Scholar
Styczynski, M.P., Jensen, K.L., Rigoutsos, I., Stephanopoulos, G.: BLOSUM62 miscalculations improve search performance. Nature Biotechnology 26(3), 274–275 (2008)
Article Google Scholar
Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Pub. Co. Inc., Singapore (2003)
Google Scholar
Szabo, A., Novak, A., Miklos, I., Hein, J.: Reticular alignment: A progressive corner-cutting method for multiple sequence alignment. BMC Bioinformatics 11 (2010)
Google Scholar
Thompson, J., Higgins, D., Gibson, T.: ClustalW: Improving the sensivity of progressive multiple sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680 (1994)
Article Google Scholar
Thompson, J., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark. Proteins-Structure Function and Bioinformatics 61(1), 127–136 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Architecture and Computer Technology, CITIC-UGR, University of Granada, Spain
Francisco Ortuño, Héctor Pomares & Ignacio Rojas
Department of Applied Mathematics, University of Granada, Spain
Olga Valenzuela

Authors

Francisco Ortuño
View author publications
You can also search for this author in PubMed Google Scholar
Olga Valenzuela
View author publications
You can also search for this author in PubMed Google Scholar
Héctor Pomares
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Rojas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Architecture and Computer Technology, University of Granada, Periodista Daniel Saucedo Aranda s/n, 18071, Granada, Spain
Ignacio Rojas
Department of Electronics Technology, University of Malaga, 29071, Malaga, Spain
Gonzalo Joya
Department of Electronics Engineering, Universitat Politecnica de Catalunya, 08034, Barcelona, Spain
Joan Cabestany

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ortuño, F., Valenzuela, O., Pomares, H., Rojas, I. (2013). Evaluating Multiple Sequence Alignments Using a LS-SVM Approach with a Heterogeneous Set of Biological Features. In: Rojas, I., Joya, G., Cabestany, J. (eds) Advances in Computational Intelligence. IWANN 2013. Lecture Notes in Computer Science, vol 7903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38682-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-38682-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38681-7
Online ISBN: 978-3-642-38682-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics