Dynamic Programming Algorithms for Two Statistical Problems in Computational Biology

Rahmann, Sven

doi:10.1007/978-3-540-39763-2_12

Dynamic Programming Algorithms for Two Statistical Problems in Computational Biology

Sven Rahmann^9,10

Conference paper

883 Accesses
11 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Abstract

We present dynamic programming algorithms for two exact statistical tests that frequently arise in computational biology. The first test concerns the decision whether an observed sequence stems from a given profile (also known as position specific score matrix or position weight matrix), or from an assumed background distribution. We show that the common assumption that the log-odds score has a Gaussian distribution is false for many short profiles, such as transcription factor binding sites or splice sites. We present an efficient implementation of a non-parametric method (first mentioned by Staden) to compute the exact score distribution. The second test concerns the decision whether observed category counts stem from a specified Multinomial distribution. A branch-and-bound method for computing exact p-values for this test was presented by Bejerano at a recent RECOMB conference. Our contribution is a dynamic programming approach to compute the entire distribution of the test statistic, allowing not only the computation of exact p-values for all values of the test statistic simultaneously, but also of the power function of the test. As one of several applications, we introduce p-value based sequence logos, which provide a more meaningful visual description of probabilistic sequences than conventional sequence logos do.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bejerano, G.: Efficient exact p-value computation and applications to biosequence analysis. In: RECOMB 2003 Proceedings, April 2003, pp. 38–47. ACM Press, New York (2003)
Chapter Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications, 2nd edn., vol. 1. John Wiley and Sons, Chichester (1971)
MATH Google Scholar
Goldstein, L., Waterman, M.: Approximations to profile score distributions. Journal of Computational Biology 1(1), 93–104 (1994)
Article Google Scholar
Haas, S.A., Beissbarth, T., Rivals, E., Krause, A., Vingron, M.: GeneNest: automated generation and visualization of gene indices. Trends Genet. 16(11), 521–523 (2000)
Article Google Scholar
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1993)
Google Scholar
Rahmann, S., Müller, T., Vingron, M.: On the power and quality of profiles with applications to transcription factor binding site detection. Unpublished Manuscript (2003)
Google Scholar
Schneider, T.D., Stephens, R.M.: Sequence logos: A new way to display consensus sequences. Nucl. Acids Res. 18, 6097–6100 (1990)
Article Google Scholar
Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12, 505–519 (1984)
Article Google Scholar
Staden, R.: Methods for calculating the probabilities of finding patterns in sequences. CABIOS 5, 89–96 (1989)
Google Scholar
Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Prüss, M., Reuter, I., Schacherer, F.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316–319 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Ihnestraße 63–73, D-14195, Berlin, Germany
Sven Rahmann
Department of Mathematics and Computer Science, Freie Universität, Berlin
Sven Rahmann

Authors

Sven Rahmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Biomathematical Sciences, The Mount Sinai School of Medicine, 10029-6574, New York, NY
Gary Benson
Institute of Biomedical and Life Sciences, Division of Environmental and Evolutionary Biology, University of Glasgow, G12 8QQ, Glasgow, Scotland
Roderic D. M. Page

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahmann, S. (2003). Dynamic Programming Algorithms for Two Statistical Problems in Computational Biology. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-39763-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics