Optimal DNA Signal Recognition Models with a Fixed Amount of Intrasignal Dependency
We study new probabilistic models for signals in DNA. Our models allow dependencies between multiple non-adjacent positions, in a generative model we call a higher-order tree. Computing the model of maximum likelihood is equivalent in our context to computing a minimum directed spanning hypergraph, a problem we show is NP-complete. We instead compute good models using simple greedy heuristics. In practice, the advantage of using our models over more standard models based on adjacent positions is modest. However, there is a notable improvement in the estimation of the probability that a given position is a signal, which is useful in the context of probabilistic gene finding. We also show that there is little improvement by incorporating multiple signals involved in gene structure into a composite signal model in our framework, though again this gives better estimation of the probability that a site is an acceptor site signal.
KeywordsOptimal Topology Directed Acyclic Graph Acceptor Site Donor Splice Site Position Weight Matrix
Unable to display preview. Download preview PDF.
- 4.Bach, F.R., Jordan, M.I.: Thin junction trees. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Proceedings of NIPS 2001, pp. 569–576. MIT Press, Cambridge (2001)Google Scholar
- 12.Ellrott, K., Yang, C., Sladek, F.M., Jiang, T.: Identifying transcription factor binding sites through Markov chain optimization. In: Proceedings of the European Conference on Computational Biology (ECCB 2002), pp. 100–109 (2002)Google Scholar
- 15.ILOG Inc. CPLEX optimizer, Computer software (2000)Google Scholar
- 16.Karger, D., Srebro, N.: Learning Markov networks: Maximum bounded treewidth graphs. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms (SODA 2001), pp. 392–401. SIAM, Philadelphia (2001)Google Scholar
- 17.Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, New York (1972)Google Scholar
- 20.Staden, R.: Computer methods to aid the determination and analysis of DNA sequences. Biochemical Society Transactions 12(6), 1005–1008 (1984)Google Scholar