Optimal DNA Signal Recognition Models with a Fixed Amount of Intrasignal Dependency

  • Broňa Brejová
  • Daniel G. Brown
  • Tomáš Vinař
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)


We study new probabilistic models for signals in DNA. Our models allow dependencies between multiple non-adjacent positions, in a generative model we call a higher-order tree. Computing the model of maximum likelihood is equivalent in our context to computing a minimum directed spanning hypergraph, a problem we show is NP-complete. We instead compute good models using simple greedy heuristics. In practice, the advantage of using our models over more standard models based on adjacent positions is modest. However, there is a notable improvement in the estimation of the probability that a given position is a signal, which is useful in the context of probabilistic gene finding. We also show that there is little improvement by incorporating multiple signals involved in gene structure into a composite signal model in our framework, though again this gives better estimation of the probability that a site is an acceptor site signal.


Optimal Topology Directed Acyclic Graph Acceptor Site Donor Splice Site Position Weight Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, P., Bafna, V.: Detecting non-adjoining correlations within signals in DNA. In: Proceedings of the Second Annual International Conference on Research in Computational Molecular Biology (RECOMB 1998), pp. 2–8. ACM Press, New York (1998)CrossRefGoogle Scholar
  2. 2.
    Akutsu, T., Bannai, H., Miyano, S., Ott, S.: On the complexity of deriving position specific score matrices from examples. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 168–177. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Andersen, L.D., Fleischner, H.: The NP-completeness of finding A-trails in Eulerian graphs and of finding spanning trees in hypergraphs. Discrete Applied Mathematics 59, 203–214 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Bach, F.R., Jordan, M.I.: Thin junction trees. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Proceedings of NIPS 2001, pp. 569–576. MIT Press, Cambridge (2001)Google Scholar
  5. 5.
    Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268, 78–94 (1997)CrossRefGoogle Scholar
  6. 6.
    Burge, C.B.: Modeling dependencies in pre-mRNA splicing signals. In: Salzberg, S.L., Searls, D.B., Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 129–164. Elsevier, Amsterdam (1998)CrossRefGoogle Scholar
  7. 7.
    Cai, D., Delcher, A., Kao, B., Kasif, S.: Modeling splice sites with Bayes networks. Bioinformatics 16(2), 152–158 (2000)CrossRefGoogle Scholar
  8. 8.
    Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory IT-14(3), 462–467 (1968)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Clark, F., Thanaraj, T.A.: Categorization and characterization of transcriptconfirmed constitutively and alternatively spliced introns and exons from human. Human Molecular Genetics 11(4), 451–454 (2002)CrossRefGoogle Scholar
  10. 10.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  11. 11.
    Dunham, I., et al.: The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999)CrossRefGoogle Scholar
  12. 12.
    Ellrott, K., Yang, C., Sladek, F.M., Jiang, T.: Identifying transcription factor binding sites through Markov chain optimization. In: Proceedings of the European Conference on Computational Biology (ECCB 2002), pp. 100–109 (2002)Google Scholar
  13. 13.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29, 131–163 (1997)zbMATHCrossRefGoogle Scholar
  14. 14.
    Gallo, G., Longo, G., Pallottino, S., Nguyen, S.: Directed hypergraphs and applications. Discrete Applied Mathematics 42, 177–201 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    ILOG Inc. CPLEX optimizer, Computer software (2000)Google Scholar
  16. 16.
    Karger, D., Srebro, N.: Learning Markov networks: Maximum bounded treewidth graphs. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms (SODA 2001), pp. 392–401. SIAM, Philadelphia (2001)Google Scholar
  17. 17.
    Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972)Google Scholar
  18. 18.
    Salzberg, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26(2), 544–548 (1998)CrossRefGoogle Scholar
  19. 19.
    Schrijver, A.: Theory of Linear and Integer Programming. Wiley and sons, Chichester (1986)zbMATHGoogle Scholar
  20. 20.
    Staden, R.: Computer methods to aid the determination and analysis of DNA sequences. Biochemical Society Transactions 12(6), 1005–1008 (1984)Google Scholar
  21. 21.
    Stormo, G.D., Schneider, T.D., Gold, L.E., Ehrenfeucht, A.: Use of the ’Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Research 10(9), 2997–3011 (1982)CrossRefGoogle Scholar
  22. 22.
    Zhang, M.Q.: Statistical features of human exons and their flanking regions. Human Molecular Genetics 7(5), 919–932 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Broňa Brejová
    • 1
  • Daniel G. Brown
    • 1
  • Tomáš Vinař
    • 1
  1. 1.School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations