Advertisement

Dynamic Programming Algorithms for Two Statistical Problems in Computational Biology

  • Sven Rahmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)

Abstract

We present dynamic programming algorithms for two exact statistical tests that frequently arise in computational biology. The first test concerns the decision whether an observed sequence stems from a given profile (also known as position specific score matrix or position weight matrix), or from an assumed background distribution. We show that the common assumption that the log-odds score has a Gaussian distribution is false for many short profiles, such as transcription factor binding sites or splice sites. We present an efficient implementation of a non-parametric method (first mentioned by Staden) to compute the exact score distribution. The second test concerns the decision whether observed category counts stem from a specified Multinomial distribution. A branch-and-bound method for computing exact p-values for this test was presented by Bejerano at a recent RECOMB conference. Our contribution is a dynamic programming approach to compute the entire distribution of the test statistic, allowing not only the computation of exact p-values for all values of the test statistic simultaneously, but also of the power function of the test. As one of several applications, we introduce p-value based sequence logos, which provide a more meaningful visual description of probabilistic sequences than conventional sequence logos do.

Keywords

Transcription Factor Binding Site Dynamic Program Algorithm Score Distribution Multinomial Distribution Background Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bejerano, G.: Efficient exact p-value computation and applications to biosequence analysis. In: RECOMB 2003 Proceedings, April 2003, pp. 38–47. ACM Press, New York (2003)CrossRefGoogle Scholar
  2. 2.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)zbMATHCrossRefGoogle Scholar
  3. 3.
    Feller, W.: An Introduction to Probability Theory and Its Applications, 2nd edn., vol. 1. John Wiley and Sons, Chichester (1971)zbMATHGoogle Scholar
  4. 4.
    Goldstein, L., Waterman, M.: Approximations to profile score distributions. Journal of Computational Biology 1(1), 93–104 (1994)CrossRefGoogle Scholar
  5. 5.
    Haas, S.A., Beissbarth, T., Rivals, E., Krause, A., Vingron, M.: GeneNest: automated generation and visualization of gene indices. Trends Genet. 16(11), 521–523 (2000)CrossRefGoogle Scholar
  6. 6.
    Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1993)Google Scholar
  7. 7.
    Rahmann, S., Müller, T., Vingron, M.: On the power and quality of profiles with applications to transcription factor binding site detection. Unpublished Manuscript (2003)Google Scholar
  8. 8.
    Schneider, T.D., Stephens, R.M.: Sequence logos: A new way to display consensus sequences. Nucl. Acids Res. 18, 6097–6100 (1990)CrossRefGoogle Scholar
  9. 9.
    Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12, 505–519 (1984)CrossRefGoogle Scholar
  10. 10.
    Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. CABIOS 5, 89–96 (1989)Google Scholar
  11. 11.
    Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Prüss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Sven Rahmann
    • 1
    • 2
  1. 1.Department of Computational Molecular BiologyMax-Planck-Institute for Molecular GeneticsBerlinGermany
  2. 2.Department of Mathematics and Computer ScienceFreie UniversitätBerlin

Personalised recommendations