Skip to main content

Evaluating Multiple Sequence Alignments Using a LS-SVM Approach with a Heterogeneous Set of Biological Features

  • Conference paper
Advances in Computational Intelligence (IWANN 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7903))

Included in the following conference series:

Abstract

Multiple sequence alignment (MSA) is an essential approach to apply in other outstanding bioinformatics tasks such as structural predictions, biological function analyses or phylogenetic modeling. However, current MSA methodologies do not reach a consensus about how sequences must be accurately aligned. Moreover, these tools usually provide partially optimal alignments, as each one is focused on specific features. Thus, the same set of sequences can provide quite different alignments, overall when sequences are less related. Consequently, researchers and biologists do not agree on how the quality of MSAs should be evaluated in order to decide the most adequate methodology. Therefore, recent evaluations tend to use more complex scores including supplementary biological features. In this work, we address the evaluation of MSAs by using a novel supervised learning approach based on Least Square Support Vector Machine (LS-SVM). This algorithm will include a set of heterogeneous features and scores in order to determine the alignment accuracies. It is assessed by means of the benchmark BAliBASE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H.Z., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.L.: Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)

    Article  Google Scholar 

  2. Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., Bourne, P.: The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000)

    Article  Google Scholar 

  3. Bradley, R.K., Roberts, A., Smoot, M., Juvekar, S., Do, J., Dewey, C., Holmes, I., Pachter, L.: Fast Statistical Alignment. PLoS Computational Biology 5(5) (2009)

    Google Scholar 

  4. Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., Apweiler, R.: The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucleic Acids Res. 32, D262–D266 (2004)

    Article  Google Scholar 

  5. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5(3), 345–352 (1979)

    Google Scholar 

  6. De Brabanter, K., Karsmakers, P., Ojeda, F., Alzate, C., De Brabanter, J., Pelckmans, K., De Moor, B., Vandewalle, J., Suykens, J.A.K.: Ls-svmlab: a matlab toolbox for least squares support vector machines, v1.8 (2011)

    Google Scholar 

  7. Do, C., Mahabhashyam, M., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15(2), 330–340 (2005)

    Article  Google Scholar 

  8. Edgar, R.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)

    Article  Google Scholar 

  9. Estévez, P.A., Tesmer, M., Perez, C.A., Zurada, J.M.: Normalized mutual information feature selection. IEEE Transactions on Neural Networks 20(2), 189–201 (2009)

    Article  Google Scholar 

  10. Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L.L., Eddy, S.R., Bateman, A.: The pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010)

    Article  Google Scholar 

  11. Henikoff, S., Henikoff, J.G.: Amino-acid substitution matrices from protein blocks. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 89(22), pp. 10915–10919 (1992)

    Google Scholar 

  12. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30(14), 3059–3066 (2002)

    Article  Google Scholar 

  13. Kemena, C., Notredame, C.: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25(19), 2455–2465 (2009)

    Article  Google Scholar 

  14. Kemena, C., Taly, J.F., Kleinjung, J., Notredame, C.: Strike: evaluation of protein msas using a single 3d structure. Bioinformatics 27(24), 3385–3391 (2011)

    Article  Google Scholar 

  15. Lassmann, T., Sonnhammer, E.: Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6 (2005)

    Google Scholar 

  16. Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11(5), 473–483 (2010)

    Article  Google Scholar 

  17. Lin, K., Kleinjung, J., Taylor, W.R., Heringa, J.: Testing homology with contact accepted mutation (cao): a contact-based markov model of protein evolution. Computational Biology and Chemistry 27(2), 93–102 (2003)

    Article  MATH  Google Scholar 

  18. Notredame, C., Higgins, D., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302(1), 205–217 (2000)

    Article  Google Scholar 

  19. O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D., Notredame, C.: 3DCoffee: Combining protein sequences and structures within multiple sequence alignments. Journal of Molecular Biology 340(2), 385–395 (2004)

    Article  Google Scholar 

  20. Pei, J.: Multiple protein sequence alignment. Current Opinion in Structural Biology 18(3), 382–386 (2008)

    Article  Google Scholar 

  21. Pei, J., Grishin, N.V.: PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808 (2007)

    Article  Google Scholar 

  22. Styczynski, M.P., Jensen, K.L., Rigoutsos, I., Stephanopoulos, G.: BLOSUM62 miscalculations improve search performance. Nature Biotechnology 26(3), 274–275 (2008)

    Article  Google Scholar 

  23. Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Pub. Co. Inc., Singapore (2003)

    Google Scholar 

  24. Szabo, A., Novak, A., Miklos, I., Hein, J.: Reticular alignment: A progressive corner-cutting method for multiple sequence alignment. BMC Bioinformatics 11 (2010)

    Google Scholar 

  25. Thompson, J., Higgins, D., Gibson, T.: ClustalW: Improving the sensivity of progressive multiple sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680 (1994)

    Article  Google Scholar 

  26. Thompson, J., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark. Proteins-Structure Function and Bioinformatics 61(1), 127–136 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ortuño, F., Valenzuela, O., Pomares, H., Rojas, I. (2013). Evaluating Multiple Sequence Alignments Using a LS-SVM Approach with a Heterogeneous Set of Biological Features. In: Rojas, I., Joya, G., Cabestany, J. (eds) Advances in Computational Intelligence. IWANN 2013. Lecture Notes in Computer Science, vol 7903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38682-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38682-4_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38681-7

  • Online ISBN: 978-3-642-38682-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics