Abstract
When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for biological sequences is inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the example alignments score close to optimal. We extend prior work on inverse alignment to partial examples and to an improved model based on minimizing the average error of the examples. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the recovery rate for multiple sequence alignment by up to 25%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Balaji, S., Sujatha, S., Kumar, S.S.C., Srinivasan, N.: PALI: a database of alignments and phylogeny of homologous protein structures. Nucleic Acids Research 29(1), 61–65 (2001)
Cook, W., Cunningham, W., Pulleyblank, W., Schrijver, A.: Combinatorial Optimization. John Wiley and Sons, New York (1998)
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, Washington DC. National Biomedical Research Foundation, vol. 5(3), pp. 345–352 (1978)
Do, C., Gross, S., Batzoglou, S.: CONTRAlign: discriminative training for protein sequence alignment. In: Proceedings of the 10th ACM Conference on Research in Computational Molecular Biology, pp. 160–174. ACM Press, New York (2006)
Eppstein, D.: Setting parameters by example. SIAM Journal on Computing 32(3), 643–653 (2003)
Gusfield, D., Stelling, P.: Parametric and inverse-parametric sequence alignment with XPARAL. Methods in Enzymology 266, 481–494 (1996)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. National Academy of Sciences USA 89, 10915–10919 (1992)
Kececioglu, J., Kim, E.: Simple and fast inverse alignment. In: Proc. 10th ACM Conference on Research in Computational Molecular Biology, pp. 441–455. ACM Press, New York (2006)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)
Sun, F., Fernández-Baca, D., Yu, W.: Inverse parametric sequence alignment. Journal of Algorithms 53, 36–54 (2004)
Wheeler, T., Kececioglu, J.: Multiple alignment by aligning alignments. In: Proc. 15th Conference on Intelligent Systems for Molecular Biology (2007)
Yu, C.-N., Joachims, T., Elber, R., Pillardy, J.: Support vector training of protein alignment models. In: Proceedings of the 11th ACM Conference on Research in Computational Molecular Biology, pp. 253–267. ACM Press, New York (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, E., Kececioglu, J. (2007). Inverse Sequence Alignment from Partial Examples. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-74126-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)