Abstract
We investigate the order of the variance of the optimal alignments (OA) score of two independent iid binary random words having the same length. The letters are equiprobable, but the scoring function is such that one letter has a larger score than the other. In this setting, we prove that the order of variance is linear in the common length. OAs constitute a generalization of longest common subsequences, they can be represented as optimal paths in a two-dimensional last passage percolation setting with dependent weights.
Similar content being viewed by others
References
Alexander, K.S.: The rate of convergence of the mean length of the longest common subsequence. Ann. Appl. Probab. 4(4), 1074–1082 (1994)
Arratia, R., Waterman, M.S.: A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Probab. 4(1), 200–225 (1994)
Baik, J., Deift, P., Johansson, K.: On the distribution of the length of the longest increasing subsequence of random permutations. J. Am. Math. Soc. 12(4), 1119–1178 (1999)
Bonetto, F., Matzinger, H.: Fluctuations of the longest common subsequence in the case of 2- and 3-letter alphabets. ALEA 2, 195–216 (2006)
Chvátal, V., Sankoff, D.: Longest common subsequences of two random sequences. J. Appl. Probab. 12, 306–315 (1975)
Clote, P., Barckofen, R.: Computational Molecular Biology: An Introduction. Wiley, Chichseter (2000)
Durbin, R., Eddy, S., Krogh, A., Mitschson, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1999)
Gong, R., Houdré, C., Işlak, Ü.: A central limit theorem for optimal alignments score in multiple random words. arXiv:1512.05699 (2015)
Hauser, R., Matzinger, H.: Letter change bias and local uniqueness in optimal sequence alignments. J. Stat. Phys. 153(3), 512–529 (2013)
Houdré, C., Işlak, Ü.: A central limit theorem for the length of the largest common subsequences in random words. arXiv:1408.1559v3
Houdré, C., Lember, J., Matzinger, H.: On the longest common increasing binary subsequence. C.R. Acad. Sci. Paris Ser. I 343, 589–594 (2006)
Houdré, C., Ma, J.: On the order of the central moments of the length of the largest common subsequences in random words. In: Houdré, C., Mason, D.M., Reynaud-Bouret, P., Rosiński, J. (eds.) High Dimensional Probability VII: The Cargèse Volume, Progress in Probability, Birkhäuser, To appear (2016)
Johansson, K.: Shape fluctuations and random matrices. Commun. Math. Phys. 209, 437–476 (2000)
Kardar, M., Parisi, G., Zhang, Y.C.: Dynamic scaling of growing interfaces. Phys. Rev. Lett. 56(9), 889–892 (1986)
Krug, J., Spohn, H.: Kinetic Roughening of Growing Surfaces. In Solids Far From Equilibrium, pp. 479–582. Cambridge University Press, Cambridge (1991)
Lember, J., Matzinger, H.: Standard deviation of the longest common subsequence. Ann. Probab. 37(3), 1192–1235 (2009)
Lember, J., Matzinger, H., Duringer, C.: Deviation from the mean in sequence comparison with a periodic sequence. ALEA 3, 1–29 (2007)
Lember, J., Matzinger, H., Torres, F.: The rate of the convergence of the mean score in random sequence comparison. Ann. Appl. Probab. 22(3), 1046–1058 (2012)
Pevzner, P.: Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge (2000)
Robin, S., Rodolphe, F., Schbath, S.: ADN, mots et modèles. Belin, Paris (2003)
Romik, D.: The Surprising Mathematics of Longest Increasing Subsequences. Cambridge University Press, Cambridge (2014)
Sankoff, D., Kruskal, J.: Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Center for the Study of Language and Information, Cambridge (1999)
Steele, M.J.: An Efron–Stein inequality for non-symmetric statistics. Ann. Stat. 14, 753–758 (1986)
Waterman, M.S.: Estimating statistical significance of sequence alignments. Philos. Trans. R. Soc. Lond. B 344, 383–390 (1994)
Waterman, M.S.: Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London (1995)
Acknowledgments
Research supported in part by the Simons Foundation Grant #246283.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Houdré, C., Matzinger, H. On the Variance of the Optimal Alignments Score for Binary Random Words and an Asymmetric Scoring Function. J Stat Phys 164, 693–734 (2016). https://doi.org/10.1007/s10955-016-1549-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-016-1549-1