A Maximum-Likelihood Formulation and EM Algorithm for the Protein Multiple Alignment Problem

Sulimova, Valentina; Razin, Nikolay; Mottl, Vadim; Muchnik, Ilya; Kulikowski, Casimir

doi:10.1007/978-3-642-16001-1_15

Valentina Sulimova²¹,
Nikolay Razin²²,
Vadim Mottl²³,
Ilya Muchnik²⁴ &
…
Casimir Kulikowski²⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6282))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

1299 Accesses
1 Citations

Abstract

A given group of protein sequences of different lengths is considered as resulting from random transformations of independent random ancestor sequences of the same preset smaller length, each produced in accordance with an unknown common probabilistic profile. We describe the process of transformation by a Hidden Markov Model (HMM) which is a direct generalization of the PAM model for amino acids. We formulate the problem of finding the maximum likelihood probabilistic ancestor profile and demonstrate its practicality. The proposed method of solving this problem allows for obtaining simultaneously the ancestor profile and the posterior distribution of its HMM, which permits efficient determination of the most probable multiple alignment of all the sequences. Results obtained on the BAliBASE 3.0 protein alignment benchmark indicate that the proposed method is generally more accurate than popular methods of multiple alignment such as CLUSTALW, DIALIGN and ProbAlign.

Download to read the full chapter text

Chapter PDF

Bayesian Multiple Protein Structure Alignment

Multiple Protein Sequence Alignment with MSAProbs

Multiple Sequence Alignment Using Probcons and Probalign

Keywords

References

Rost, B., Sander, C., Schneider, R.P.: - an automatic server for protein secondary structure prediction. Computational Applications in Biosciences 10, 53–60 (1994)
CAS Google Scholar
Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1), 131–144 (2002)
Article CAS PubMed Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, p. 356. Cambridge University Press, Cambridge (1998)
Book Google Scholar
Attwood, T.K.: The PRINTS database: A resource for identification of protein families. Brief Bioinformatics 3, 252–263 (2002)
Article CAS PubMed Google Scholar
Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylo-genetic trees. Molecular Biology 212, 403–428 (1987)
Google Scholar
Sankoff, D., Cedergren, R.J.: Simultaneous comparison of three or more sequences related by a tree. In: Sankoff, D., Kruskal, J.B. (eds.) Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, pp. 253–263. Addison-Wesley, Reading (1989)
Google Scholar
Altschul, S.F., Lipman, D.J.: Trees, stars, and multiple biological sequence alignment. SIAM J. Appl. Math. 49, 197–209 (1989)
Article Google Scholar
Todd Wareham, H.: A simplified proof of the NP- and MAX SNP-hardness of multiple sequence tree alignments. J. Comput. Biol. 2(4), 509–514 (1995)
Article Google Scholar
Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 48, 1073–1082 (1988)
Article Google Scholar
Notredame, C., Higgins, D.G., T-Coffee, H.J.: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)
Article CAS PubMed Google Scholar
Subramanian, A.R., Kaufmann, M., Morgenstern, B.: DIALIGN-TX: Greedy and progres-sive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology 3, 6 (2008)
Article PubMed PubMed Central Google Scholar
Barton, G.J., Sternberg, M.J.E.: A strategy for the rapid multiple alignment of protein se-quences. J. Mol. Biol. 198, 327–337 (1987)
Article CAS PubMed Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge (1998)
Book Google Scholar
Roshan, U., Libesay, D.R.: Probalign: Multiple Sequence Alignment Using Partition Function Posterior Probabilities. Oxford University Press, Oxford (2005)
Google Scholar
Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic Consis-tency-based Multiple Sequence Alignment. Genome Res. 15, 330–340 (2005)
Article CAS PubMed PubMed Central Google Scholar
Pei, J., Grishin, N.V.: PROMALS: Towards accurate multiple sequence alignments of dis-tantly related proteins. Bioinformatics 23, 802–808 (2007)
Article CAS PubMed Google Scholar
Dayhoff, M.O., Schwarts, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequences and Structures 5(suppl. 3), 345–352 (1978)
Google Scholar
Sulimova, V., Mottl, V., Mirkin, B., Muchnik, I., Kulikowski, C.: A class of evolution-based kernels for protein homology analysis: A generalization of the PAM model. In: Proceedings of the 5th International Symposium on Bioinformatics Research and Applications, May 13-16, pp. 284–296. Nova Southeastern University, Ft. Lauderdale (2009)
Chapter Google Scholar
Thompson, J.D., Koehl, P., Ripp, R., Poch, O.: BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136 (2005)
Article CAS PubMed Google Scholar
BALiBASE3.0: A benchmark alignment database home page, http://www-bio3d-igbmc.u-strasbg.fr/~julie/balibase/index.html

Download references

Author information

Authors and Affiliations

Tula State University, Lenine Ave. 92, 300600, Russia, Tula
Valentina Sulimova
MIPT, Kerchenskaya St.1A, 117303, Russia, Moscow
Nikolay Razin
Computing Center of the RAS, Vavilov St.40, 119333, Russia, Moscow
Vadim Mottl
DIMACS, Rutgers University, New Brunswick, NJ, 08901
Ilya Muchnik
Department of Computer Science, Rutgers University, New Brunswick, NJ, 08901
Casimir Kulikowski

Authors

Valentina Sulimova
View author publications
You can also search for this author in PubMed Google Scholar
Nikolay Razin
View author publications
You can also search for this author in PubMed Google Scholar
Vadim Mottl
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Muchnik
View author publications
You can also search for this author in PubMed Google Scholar
Casimir Kulikowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Computing and Information Sciences, Radboud University Nijmegen, Heyendaalseweg 135, 6525AJ, Nijmegen, The Netherlands
Tjeerd M. H. Dijkstra , Elena Marchiori & Tom Heskes , &
Institute for Computing and Information Sciences, Turku Centre for Computer Science, Radboud University Nijmegen, Heyendaalseweg 135, 6525AJ, Nijmegen, The Netherlands
Evgeni Tsivtsivadze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sulimova, V., Razin, N., Mottl, V., Muchnik, I., Kulikowski, C. (2010). A Maximum-Likelihood Formulation and EM Algorithm for the Protein Multiple Alignment Problem. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds) Pattern Recognition in Bioinformatics. PRIB 2010. Lecture Notes in Computer Science(), vol 6282. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16001-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-16001-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16000-4
Online ISBN: 978-3-642-16001-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A Maximum-Likelihood Formulation and EM Algorithm for the Protein Multiple Alignment Problem

Abstract

Chapter PDF

Similar content being viewed by others

Bayesian Multiple Protein Structure Alignment

Multiple Protein Sequence Alignment with MSAProbs

Multiple Sequence Alignment Using Probcons and Probalign

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Maximum-Likelihood Formulation and EM Algorithm for the Protein Multiple Alignment Problem

Abstract

Chapter PDF

Similar content being viewed by others

Bayesian Multiple Protein Structure Alignment

Multiple Protein Sequence Alignment with MSAProbs

Multiple Sequence Alignment Using Probcons and Probalign

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation