Skip to main content

A General Framework for Local Pairwise Alignment Statistics with Gaps

  • Conference paper
  • 772 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5724))

Abstract

We present a novel dynamic programming framework that allows one to compute tight upper bounds for the p-values of gapped local alignments in pseudo–polynomial time. Our algorithms are fast and simple and unlike most earlier solutions, require no curve fitting by sampling. Moreover, our new methods do not suffer from the so–called edge effects, a by–product of the common practice used to compute p-values. These new methods also provide a way to get into very small p-values, that are needed when comparing sequences against large databases. Based on our experiments, accurate estimates of small p-values are difficult to get by curve fitting.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, 1st edn. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  2. Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  CAS  PubMed  Google Scholar 

  3. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  CAS  PubMed  Google Scholar 

  4. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. PNAS 87, 2264–2268 (1990)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)

    Article  CAS  PubMed  Google Scholar 

  7. Karlin, S., Dembo, A., Kawabata, T.: Statistical composition of high–scoring segments from molecular sequences. The Annals of Statistics 18, 571–581 (1990)

    Article  Google Scholar 

  8. Karlin, S.: Statistical signals in bioinformatics. Proc. Natl. Acad. Sci. USA 102, 13355–13362 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Mercier, S., Cellier, D., Charlot, F., Daudin, J.J.: Exact and asymptotic distribution of the local score of one i.i.d. Random sequence. In: Gascuel, O., Sagot, M.-F. (eds.) JOBIM 2000. LNCS, vol. 2066, pp. 74–83. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Pearson, W.: Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84 (1998)

    Article  CAS  PubMed  Google Scholar 

  11. Mitrophanov, A., Borodovsky, M.: Statistical significance in biological sequence analysis. Briefings in Bioinformatics 7, 2–24 (2006)

    Article  CAS  PubMed  Google Scholar 

  12. Naor, D., Brutlag, D.: On suboptimal alignments of biological sequences. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 179–196. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  13. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, 1st edn. Cambridge University Press, Cambridge (1997)

    Book  Google Scholar 

  14. Graham, R., Knuth, D., Patashnik, O.: Concrete mathematics: A Foundation for Computer Science, 2nd edn. Addison-Wesley, Reading (1994)

    Google Scholar 

  15. Cooley, J., Tukey, J.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19, 297–301 (1965)

    Article  Google Scholar 

  16. Bernstein, D.: Multidigit multiplication for mathematicians (2001)

    Google Scholar 

  17. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory on NP-Completeness. W. H. Freeman and Company, New York (1979)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rastas, P. (2009). A General Framework for Local Pairwise Alignment Statistics with Gaps. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04241-6_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04240-9

  • Online ISBN: 978-3-642-04241-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics