Skip to main content

A Practical Approach to Significance Assessment in Alignment with Gaps

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3500))

Abstract

Current numerical methods for assessing the statistical significance of local alignments with gaps are time consuming. Analytical solutions thus far have been limited to specific cases. Here, we present a new line of attack to the problem of statistical significance assessment. We combine this new approach with known properties of the dynamics of the global alignment algorithm and high performance numerical techniques and present a novel method for assessing significance of gaps within practical time scales. The results and performance of these new methods test very well against tried methods with drastically less effort.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  2. Altschul, S.F., Gish, W.: Local Alignment Statistics. Methods in Enzymology 266, 460–480 (1996)

    Article  Google Scholar 

  3. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)

    Article  Google Scholar 

  4. Altschul, S.F., Bundschuh, R., Olsen, R., Hwa, T.: The estimation of statistical parameters for local alignment score distributions. Nucl. Acids Res. 29, 351–361 (2001)

    Article  Google Scholar 

  5. Boutet de Monvel, J.: Extensive Simulations for Longest Common Subsequences. Europ. Phys. J. B 7, 293–308 (1999)

    Article  Google Scholar 

  6. Boutet de Monvel, J.: Mean-field Approximations to the Longest Common Subsequence Problem. Phys. Rev. E 62, 204–209 (2000)

    Article  Google Scholar 

  7. Bundschuh, R., Hwa, T.: An analytic study of the phase transition line in local sequence alignment with gaps. Disc. Appl. Math. 104, 113–142 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bundschuh, R.: An analytic approach to significance assessment in local sequence alignment with gaps. In: Istrail, S., et al. (eds.) Proceedings of the fourth annual international conference on computational molecular biology (RECOMB 2000), pp. 86–95. ACM Press, New York (2000)

    Chapter  Google Scholar 

  9. Bundschuh, R.: High Precision Simulations of the Longest Common Subsequence Problem. Europ. Phys. J. B 22, 533–541 (2001)

    Article  Google Scholar 

  10. Bundschuh, R.: Asymmetric exclusion process and extremal statistics of random sequences. Phys. Rev. E 65, 031911 (2002)

    Article  Google Scholar 

  11. Chia, N., Bundschuh, R.: Finite Width Model Sequence Comparison. Phys. Rev. E 70, 021906 (2004)

    Article  MathSciNet  Google Scholar 

  12. Collins, J.F., Coulson, A.F.W., Lyall, A.: The significance of protein sequence similarities. CABIOS 4, 67–71 (1988)

    Google Scholar 

  13. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A Model of Evolutionary Change in Proteins. In: Dayhoff, M.O., Eck, R.V. (eds.) Atlas of Protein Sequence and Structure, vol. 5 (suppl. 3), pp. 345–358 (1978)

    Google Scholar 

  14. Dančík, V., Paterson, M.: Longest Common Subsequences. In: Privara, I., Ružička, P., Rovan, B. (eds.) MFCS 1994. LNCS, vol. 841, pp. 127–142. Springer, Heidelberg (1994)

    Google Scholar 

  15. Dančík, V.: Expected Length of Longest Common Subsequences. PhD thesis, University of Warwick (1994)

    Google Scholar 

  16. Derrida, B., Lebowitz, J.L.: Exact Large Deviation Function in the Asymmetric Exclusion Process. Phys. Rev. Lett. 80, 209–213 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  17. Derrida, B., Appert, C.: Universal Large-Deviation Function of the Karder-Parisi-Zhang Equation in One Dimension. J. Stat. Phys. 94, 1–30 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  18. Doolittle, R.F.: Methods in Enzymology, vol. 266. Academic Press, San Diego (1996)

    Google Scholar 

  19. Drasdo, D., Hwa, T., Lassig, M.: Scaling Laws and Similiarity Detection in Sequence Alignment with Gaps. J. Comp. Biol. 7, 115–141 (2001)

    Article  Google Scholar 

  20. Gumbel, E.J.: Statistics of Extremes. Columbia University Press, New York (1958)

    MATH  Google Scholar 

  21. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)

    Article  Google Scholar 

  22. Hwa, T., Lässig, M.: Similiarity-Detection and Localization. Phys. Rev. Lett. 76, 2591–2594 (1996)

    Article  Google Scholar 

  23. Kardar, M., Parisi, G., Zhang, Y.C.: Dynamic Scaling of Growing Surfaces. Phys. Rev. Lett. 56, 889–892 (1986)

    Article  MATH  Google Scholar 

  24. Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

  25. Karlin, S., Dembo, A.: Limit distributions of the maximal segmental score among Markov-dependent partial sums. Adv. Appl. Prob. 24, 113–140 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  26. Karlin, S., Altschul, S.F.: Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. USA 90, 5873–5877 (1993)

    Article  Google Scholar 

  27. Lehoucq, R.B., Scott, J.A.: An evaluation of software for computing eigenvalues of sparse nonsymmetric matrices. preprint MCS-P547-1195, Argonne National Laboratory, Argonne, IL (1996)

    Google Scholar 

  28. Lehoucq, R.B.: Truncated QR algorithms and the numerical solution of large scale eigenvalue problems. preprint MCS-P648-0297, Argonne National Laboratory, Argonne, IL (1997)

    Google Scholar 

  29. Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK Users’ Guide: Solutions of Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia (1997)

    Google Scholar 

  30. Mott, R.: Maximum likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54, 59–75 (1992)

    MATH  Google Scholar 

  31. Mott, R., Tribe, R.: Approximate statistics of gapped alignments. J. Comp. Biol. 6, 91–112 (1999)

    Article  Google Scholar 

  32. Mott, R.: Accurate estimate of p-values for gapped local sequence alignment. Private communication (1999)

    Google Scholar 

  33. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  34. Olsen, R., Bundschuh, R., Hwa, T.: Rapid Assessment of Extremal Statistics for Gapped Local Alignment. In: Lengauer, T., et al. (eds.) Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pp. 211–222. AAAI Press, Menlo Park (1999)

    Google Scholar 

  35. Pearson, W.R.: Searching protein sequence libraries. comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650 (1991)

    Article  Google Scholar 

  36. Sorensen, D.C.: Implicit application of polynomial filters in a k-step Arnoldi method. SIAM J. Matrix Analysis and Applications 13, 357–385 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  37. Siegmund, D., Yakir, B.: Approximate p-values for Sequence Alignments. Ann. Statist. 28, 657–680 (2000)

    MATH  MathSciNet  Google Scholar 

  38. Smith, S.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2, 482–489 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  39. Smith, T.F., Waterman, M.S., Burks, C.: The statistical distribution of nucleic acid similarities. Nucleic Acids Research 13, 645–656 (1985)

    Article  Google Scholar 

  40. Waterman, M.S., Gordon, L., Arratia, R.: Phase transitions in sequence matches and nucleic acid structure. Proc. Natl. Acad. Sci. USA 84, 1239–1243 (1987)

    Article  MathSciNet  Google Scholar 

  41. Waterman, M.S., Vingron, M.: Sequence Comparison Significance and Poisson Approximation. Stat. Sci. 9, 367–381 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  42. Waterman, M.S., Vingron, M.: Rapid and accurate estimates of statistical significance for sequence database searches. Proc. Natl. Acad. Sci. USA 91, 4625–4628 (1994)

    Article  MATH  Google Scholar 

  43. Waterman, M.S.: Introduction to Computational Biology. Chapman & Hall, London (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chia, N., Bundschuh, R. (2005). A Practical Approach to Significance Assessment in Alignment with Gaps. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_36

Download citation

  • DOI: https://doi.org/10.1007/11415770_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25866-7

  • Online ISBN: 978-3-540-31950-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics