Journal of Statistical Physics

, Volume 164, Issue 3, pp 693–734 | Cite as

On the Variance of the Optimal Alignments Score for Binary Random Words and an Asymmetric Scoring Function

  • Christian Houdré
  • Heinrich Matzinger


We investigate the order of the variance of the optimal alignments (OA) score of two independent iid binary random words having the same length. The letters are equiprobable, but the scoring function is such that one letter has a larger score than the other. In this setting, we prove that the order of variance is linear in the common length. OAs constitute a generalization of longest common subsequences, they can be represented as optimal paths in a two-dimensional last passage percolation setting with dependent weights.


Optimal alignments Variance bounds Longest common subsequences Last passage percolation 

Mathematics Subject Classification

60K35 60C05 05A05 



Research supported in part by the Simons Foundation Grant #246283.


  1. 1.
    Alexander, K.S.: The rate of convergence of the mean length of the longest common subsequence. Ann. Appl. Probab. 4(4), 1074–1082 (1994)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Arratia, R., Waterman, M.S.: A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Probab. 4(1), 200–225 (1994)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Baik, J., Deift, P., Johansson, K.: On the distribution of the length of the longest increasing subsequence of random permutations. J. Am. Math. Soc. 12(4), 1119–1178 (1999)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Bonetto, F., Matzinger, H.: Fluctuations of the longest common subsequence in the case of 2- and 3-letter alphabets. ALEA 2, 195–216 (2006)MathSciNetMATHGoogle Scholar
  5. 5.
    Chvátal, V., Sankoff, D.: Longest common subsequences of two random sequences. J. Appl. Probab. 12, 306–315 (1975)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Clote, P., Barckofen, R.: Computational Molecular Biology: An Introduction. Wiley, Chichseter (2000)MATHGoogle Scholar
  7. 7.
    Durbin, R., Eddy, S., Krogh, A., Mitschson, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1999)MATHGoogle Scholar
  8. 8.
    Gong, R., Houdré, C., Işlak, Ü.: A central limit theorem for optimal alignments score in multiple random words. arXiv:1512.05699 (2015)
  9. 9.
    Hauser, R., Matzinger, H.: Letter change bias and local uniqueness in optimal sequence alignments. J. Stat. Phys. 153(3), 512–529 (2013)ADSMathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Houdré, C., Işlak, Ü.: A central limit theorem for the length of the largest common subsequences in random words. arXiv:1408.1559v3
  11. 11.
    Houdré, C., Lember, J., Matzinger, H.: On the longest common increasing binary subsequence. C.R. Acad. Sci. Paris Ser. I 343, 589–594 (2006)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Houdré, C., Ma, J.: On the order of the central moments of the length of the largest common subsequences in random words. In: Houdré, C., Mason, D.M., Reynaud-Bouret, P., Rosiński, J. (eds.) High Dimensional Probability VII: The Cargèse Volume, Progress in Probability, Birkhäuser, To appear (2016)Google Scholar
  13. 13.
    Johansson, K.: Shape fluctuations and random matrices. Commun. Math. Phys. 209, 437–476 (2000)ADSMathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Kardar, M., Parisi, G., Zhang, Y.C.: Dynamic scaling of growing interfaces. Phys. Rev. Lett. 56(9), 889–892 (1986)ADSCrossRefMATHGoogle Scholar
  15. 15.
    Krug, J., Spohn, H.: Kinetic Roughening of Growing Surfaces. In Solids Far From Equilibrium, pp. 479–582. Cambridge University Press, Cambridge (1991)Google Scholar
  16. 16.
    Lember, J., Matzinger, H.: Standard deviation of the longest common subsequence. Ann. Probab. 37(3), 1192–1235 (2009)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Lember, J., Matzinger, H., Duringer, C.: Deviation from the mean in sequence comparison with a periodic sequence. ALEA 3, 1–29 (2007)MathSciNetMATHGoogle Scholar
  18. 18.
    Lember, J., Matzinger, H., Torres, F.: The rate of the convergence of the mean score in random sequence comparison. Ann. Appl. Probab. 22(3), 1046–1058 (2012)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Pevzner, P.: Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge (2000)MATHGoogle Scholar
  20. 20.
    Robin, S., Rodolphe, F., Schbath, S.: ADN, mots et modèles. Belin, Paris (2003)Google Scholar
  21. 21.
    Romik, D.: The Surprising Mathematics of Longest Increasing Subsequences. Cambridge University Press, Cambridge (2014)CrossRefMATHGoogle Scholar
  22. 22.
    Sankoff, D., Kruskal, J.: Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Center for the Study of Language and Information, Cambridge (1999)MATHGoogle Scholar
  23. 23.
    Steele, M.J.: An Efron–Stein inequality for non-symmetric statistics. Ann. Stat. 14, 753–758 (1986)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Waterman, M.S.: Estimating statistical significance of sequence alignments. Philos. Trans. R. Soc. Lond. B 344, 383–390 (1994)ADSCrossRefGoogle Scholar
  25. 25.
    Waterman, M.S.: Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London (1995)CrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.School of MathematicsGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations