Abstract
We study new probabilistic models for signals in DNA. Our models allow dependencies between multiple non-adjacent positions, in a generative model we call a higher-order tree. Computing the model of maximum likelihood is equivalent in our context to computing a minimum directed spanning hypergraph, a problem we show is NP-complete. We instead compute good models using simple greedy heuristics. In practice, the advantage of using our models over more standard models based on adjacent positions is modest. However, there is a notable improvement in the estimation of the probability that a given position is a signal, which is useful in the context of probabilistic gene finding. We also show that there is little improvement by incorporating multiple signals involved in gene structure into a composite signal model in our framework, though again this gives better estimation of the probability that a site is an acceptor site signal.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, P., Bafna, V.: Detecting non-adjoining correlations within signals in DNA. In: Proceedings of the Second Annual International Conference on Research in Computational Molecular Biology (RECOMB 1998), pp. 2–8. ACM Press, New York (1998)
Akutsu, T., Bannai, H., Miyano, S., Ott, S.: On the complexity of deriving position specific score matrices from examples. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 168–177. Springer, Heidelberg (2002)
Andersen, L.D., Fleischner, H.: The NP-completeness of finding A-trails in Eulerian graphs and of finding spanning trees in hypergraphs. Discrete Applied Mathematics 59, 203–214 (1995)
Bach, F.R., Jordan, M.I.: Thin junction trees. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Proceedings of NIPS 2001, pp. 569–576. MIT Press, Cambridge (2001)
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268, 78–94 (1997)
Burge, C.B.: Modeling dependencies in pre-mRNA splicing signals. In: Salzberg, S.L., Searls, D.B., Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 129–164. Elsevier, Amsterdam (1998)
Cai, D., Delcher, A., Kao, B., Kasif, S.: Modeling splice sites with Bayes networks. Bioinformatics 16(2), 152–158 (2000)
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory IT-14(3), 462–467 (1968)
Clark, F., Thanaraj, T.A.: Categorization and characterization of transcriptconfirmed constitutively and alternatively spliced introns and exons from human. Human Molecular Genetics 11(4), 451–454 (2002)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Dunham, I., et al.: The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999)
Ellrott, K., Yang, C., Sladek, F.M., Jiang, T.: Identifying transcription factor binding sites through Markov chain optimization. In: Proceedings of the European Conference on Computational Biology (ECCB 2002), pp. 100–109 (2002)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)
Gallo, G., Longo, G., Pallottino, S., Nguyen, S.: Directed hypergraphs and applications. Discrete Applied Mathematics 42, 177–201 (1993)
ILOG Inc. CPLEX optimizer, Computer software (2000)
Karger, D., Srebro, N.: Learning Markov networks: Maximum bounded treewidth graphs. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms (SODA 2001), pp. 392–401. SIAM, Philadelphia (2001)
Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972)
Salzberg, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26(2), 544–548 (1998)
Schrijver, A.: Theory of Linear and Integer Programming. Wiley and sons, Chichester (1986)
Staden, R.: Computer methods to aid the determination and analysis of DNA sequences. Biochemical Society Transactions 12(6), 1005–1008 (1984)
Stormo, G.D., Schneider, T.D., Gold, L.E., Ehrenfeucht, A.: Use of the ’Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Research 10(9), 2997–3011 (1982)
Zhang, M.Q.: Statistical features of human exons and their flanking regions. Human Molecular Genetics 7(5), 919–932 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brejová, B., Brown, D.G., Vinař, T. (2003). Optimal DNA Signal Recognition Models with a Fixed Amount of Intrasignal Dependency. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-39763-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive