Abstract
We propose a simple tractable pair hidden Markov model for pairwise sequence alignment that accounts for the presence of short tandem repeats. Using the framework of gain functions, we design several optimization criteria for decoding this model and describe the resulting decoding algorithms, ranging from the traditional Viterbi and posterior decoding to block-based decoding algorithms specialized for our model. We compare the accuracy of individual decoding algorithms on simulated data and find our approach superior to the classical three-state pair HMM in simulations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Benson, G.: Sequence alignment with tandem duplication. Journal of Computational Biology 4(3), 351–357 (1997)
Benson, G.: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27(2), 573–580 (1999)
Bérard, S., Nicolas, F., Buard, J., Gascuel, O., Rivals, E.: A fast and specific alignment method for minisatellite maps. Evolutionary Bioinformatics Online 2, 303 (2006)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press (1998)
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)
Flicek, P., et al.: Ensembl 2013. Nucleic Acids Research 41(D1), D48–D55 (2013)
Freschi, V., Bogliolo, A.: A lossy compression technique enabling duplication-aware sequence alignment. Evolutionary Bioinformatics Online 8, 171 (2012)
Frith, M.C.: A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Res. 39(4), e23 (2011)
Gemayel, R., Vinces, M.D., Legendre, M., Verstrepen, K.J.: Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annual Review of Genetics 44, 445–477 (2010)
Hamada, M., Kiryu, H., Sato, K., Mituyama, T., Asai, K.: Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics 25(4), 465–473 (2009)
Harris, R.: Improved pairwise alignment of genomic DNA. PhD thesis, Pennsylvania State University (2007)
Hickey, G., Blanchette, M.: A probabilistic model for sequence alignment with context-sensitive indels. Journal of Computational Biology 18(11), 1449–1464 (2011)
Holmes, I., Durbin, R.: Dynamic programming alignment accuracy. Journal of Computational Biology 5(3), 493–504 (1998)
Hudek, A.K.: Improvements in the Accuracy of Pairwise Genomic Alignment. PhD thesis, University of Waterloo, Canada (2010)
Kováč, P., Brejová, B., Vinař, T.: Aligning sequences with repetitive motifs. In: Information Technologies - Applications and Theory (ITAT), pp. 41–48 (2012)
Lunter, G., Rocco, A., Mimouni, N., Heger, A., Caldeira, A., Hein, J.: Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Research 18(2), 298–309 (2008)
Messer, P.W., Arndt, P.F.: The majority of recent short DNA insertions in the human genome are tandem duplications. Mol. Biol. Evol. 24(5), 1190–1197 (2007)
Miyazawa, S.: A reliable sequence alignment method based on probabilities of residue correspondences. Protein Engineering 8(10), 999–1009 (1995)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–443 (1970)
Pachter, L., Alexandersson, M., Cawley, S.: Applications of generalized pair hidden Markov models to alignment and gene finding problems. Journal of Computational Biology 9(2), 389–399 (2002)
Sammeth, M., Stoye, J.: Comparing tandem repeats with duplications and excisions of variable degree. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(4), 395–407 (2006)
Satija, R., Hein, J., Lunter, G.A.: Genome-wide functional element detection using pairwise statistical alignment outperforms multiple genome footprinting techniques. Bioinformatics 26(17), 2116–2120 (2010)
Schwartz, A.S., Pachter, L.: Multiple alignment by sequence annealing. Bioinformatics, 23(2), e24–e29 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nánási, M., Vinař, T., Brejová, B. (2013). Probabilistic Approaches to Alignment with Tandem Repeats. In: Darling, A., Stoye, J. (eds) Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science(), vol 8126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40453-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-40453-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40452-8
Online ISBN: 978-3-642-40453-5
eBook Packages: Computer ScienceComputer Science (R0)