Abstract
Current numerical methods for assessing the statistical significance of local alignments with gaps are time consuming. Analytical solutions thus far have been limited to specific cases. Here, we present a new line of attack to the problem of statistical significance assessment. We combine this new approach with known properties of the dynamics of the global alignment algorithm and high performance numerical techniques and present a novel method for assessing significance of gaps within practical time scales. The results and performance of these new methods test very well against tried methods with drastically less effort.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)
Altschul, S.F., Gish, W.: Local Alignment Statistics. Methods in Enzymology 266, 460–480 (1996)
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)
Altschul, S.F., Bundschuh, R., Olsen, R., Hwa, T.: The estimation of statistical parameters for local alignment score distributions. Nucl. Acids Res. 29, 351–361 (2001)
Boutet de Monvel, J.: Extensive Simulations for Longest Common Subsequences. Europ. Phys. J. B 7, 293–308 (1999)
Boutet de Monvel, J.: Mean-field Approximations to the Longest Common Subsequence Problem. Phys. Rev. E 62, 204–209 (2000)
Bundschuh, R., Hwa, T.: An analytic study of the phase transition line in local sequence alignment with gaps. Disc. Appl. Math. 104, 113–142 (2000)
Bundschuh, R.: An analytic approach to significance assessment in local sequence alignment with gaps. In: Istrail, S., et al. (eds.) Proceedings of the fourth annual international conference on computational molecular biology (RECOMB 2000), pp. 86–95. ACM Press, New York (2000)
Bundschuh, R.: High Precision Simulations of the Longest Common Subsequence Problem. Europ. Phys. J. B 22, 533–541 (2001)
Bundschuh, R.: Asymmetric exclusion process and extremal statistics of random sequences. Phys. Rev. E 65, 031911 (2002)
Chia, N., Bundschuh, R.: Finite Width Model Sequence Comparison. Phys. Rev. E 70, 021906 (2004)
Collins, J.F., Coulson, A.F.W., Lyall, A.: The significance of protein sequence similarities. CABIOS 4, 67–71 (1988)
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A Model of Evolutionary Change in Proteins. In: Dayhoff, M.O., Eck, R.V. (eds.) Atlas of Protein Sequence and Structure, vol. 5 (suppl. 3), pp. 345–358 (1978)
Dančík, V., Paterson, M.: Longest Common Subsequences. In: Privara, I., Ružička, P., Rovan, B. (eds.) MFCS 1994. LNCS, vol. 841, pp. 127–142. Springer, Heidelberg (1994)
Dančík, V.: Expected Length of Longest Common Subsequences. PhD thesis, University of Warwick (1994)
Derrida, B., Lebowitz, J.L.: Exact Large Deviation Function in the Asymmetric Exclusion Process. Phys. Rev. Lett. 80, 209–213 (1998)
Derrida, B., Appert, C.: Universal Large-Deviation Function of the Karder-Parisi-Zhang Equation in One Dimension. J. Stat. Phys. 94, 1–30 (1999)
Doolittle, R.F.: Methods in Enzymology, vol. 266. Academic Press, San Diego (1996)
Drasdo, D., Hwa, T., Lassig, M.: Scaling Laws and Similiarity Detection in Sequence Alignment with Gaps. J. Comp. Biol. 7, 115–141 (2001)
Gumbel, E.J.: Statistics of Extremes. Columbia University Press, New York (1958)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)
Hwa, T., Lässig, M.: Similiarity-Detection and Localization. Phys. Rev. Lett. 76, 2591–2594 (1996)
Kardar, M., Parisi, G., Zhang, Y.C.: Dynamic Scaling of Growing Surfaces. Phys. Rev. Lett. 56, 889–892 (1986)
Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)
Karlin, S., Dembo, A.: Limit distributions of the maximal segmental score among Markov-dependent partial sums. Adv. Appl. Prob. 24, 113–140 (1992)
Karlin, S., Altschul, S.F.: Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. USA 90, 5873–5877 (1993)
Lehoucq, R.B., Scott, J.A.: An evaluation of software for computing eigenvalues of sparse nonsymmetric matrices. preprint MCS-P547-1195, Argonne National Laboratory, Argonne, IL (1996)
Lehoucq, R.B.: Truncated QR algorithms and the numerical solution of large scale eigenvalue problems. preprint MCS-P648-0297, Argonne National Laboratory, Argonne, IL (1997)
Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK Users’ Guide: Solutions of Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia (1997)
Mott, R.: Maximum likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54, 59–75 (1992)
Mott, R., Tribe, R.: Approximate statistics of gapped alignments. J. Comp. Biol. 6, 91–112 (1999)
Mott, R.: Accurate estimate of p-values for gapped local sequence alignment. Private communication (1999)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Olsen, R., Bundschuh, R., Hwa, T.: Rapid Assessment of Extremal Statistics for Gapped Local Alignment. In: Lengauer, T., et al. (eds.) Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pp. 211–222. AAAI Press, Menlo Park (1999)
Pearson, W.R.: Searching protein sequence libraries. comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650 (1991)
Sorensen, D.C.: Implicit application of polynomial filters in a k-step Arnoldi method. SIAM J. Matrix Analysis and Applications 13, 357–385 (1992)
Siegmund, D., Yakir, B.: Approximate p-values for Sequence Alignments. Ann. Statist. 28, 657–680 (2000)
Smith, S.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2, 482–489 (1981)
Smith, T.F., Waterman, M.S., Burks, C.: The statistical distribution of nucleic acid similarities. Nucleic Acids Research 13, 645–656 (1985)
Waterman, M.S., Gordon, L., Arratia, R.: Phase transitions in sequence matches and nucleic acid structure. Proc. Natl. Acad. Sci. USA 84, 1239–1243 (1987)
Waterman, M.S., Vingron, M.: Sequence Comparison Significance and Poisson Approximation. Stat. Sci. 9, 367–381 (1994)
Waterman, M.S., Vingron, M.: Rapid and accurate estimates of statistical significance for sequence database searches. Proc. Natl. Acad. Sci. USA 91, 4625–4628 (1994)
Waterman, M.S.: Introduction to Computational Biology. Chapman & Hall, London (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chia, N., Bundschuh, R. (2005). A Practical Approach to Significance Assessment in Alignment with Gaps. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_36
Download citation
DOI: https://doi.org/10.1007/11415770_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25866-7
Online ISBN: 978-3-540-31950-4
eBook Packages: Computer ScienceComputer Science (R0)