Gene Structure Submodels

Axelson-Fisk, Marina

doi:10.1007/978-1-4471-6693-1_5

Gene Structure Submodels

Marina Axelson-Fisk⁶

Chapter
First Online: 01 January 2015

1344 Accesses

Part of the book series: Computational Biology ((COBO,volume 20))

Abstract

A gene model algorithm integrates a wide range of scores, or signals, coming from the ingoing states of the model. These states are themselves complex submodels, which incorporate a number of sensors used to score the different characteristics of the submodel. Such sensors are traditionally divided into two groups: content sensors and signal sensors. Signal sensors model the transition between states, and attempt to detect the boundaries between exons and introns in the sequence, while content sensors score the content of a candidate region, such as the base composition or length distribution of a candidate exon or intron. In this chapter we describe some of the main submodels used in gene finding algorithms, and detail a number of different methods for integrating the sensors the submodels incorporate.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 821–837 (1964)
MathSciNet Google Scholar
Alexandersson, M., Cawley, S., Pachter, L.: SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13, 496–502 (2003)
Article Google Scholar
Axelson-Fisk, M., Sunnerhagen, P.: Gene finding in fungal genomes. In: Sunnerhagen, P., Piskur, J. (eds.) Topics in Current Genetics: Comparative Genomics Using Fungi as Models, pp. 1–29. Springer, Berlin (2005)
Google Scholar
Bennetzen, J.L., Hall, B.D.: Codon selection in yeast. J. Biol. Chem. 257, 3026–3031 (1982)
Google Scholar
Bernardi, G.: Isochores and the evolutionary genomics of vertebrates. Gene 241, 3–7 (2000)
Article Google Scholar
Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Menier-Rotival, M., Rodier, F.: The mosaic genome of warm-blooded vertebrates. Science 228, 953–958 (1985)
Article Google Scholar
Biémont, C., Vieira, C.: Junk DNA as an evolutionary force. Nature 443, 521–524 (2006)
Article Google Scholar
Bobbio, A., Horvath, A., Telek, M.: PhFit: a general phase-type fitting tool. Proc. Dep. Syst. Netw. (DSN-02) 1, 1 (2002)
Google Scholar
Bobbio, A., Horvath, A., Scarpa, M., Telek, M.: Acyclic discrete phase type distributions: properties and a parameter estimation algorithm. Perform. Eval. 54, 1–32 (2003)
Article Google Scholar
Brown, D.: A note on approximations to probability distributions. Inf. Control 2, 386–392 (1959)
Article MATH Google Scholar
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267 (2000)
Article Google Scholar
Brunak, S., Engelbrecht, J., Knudsen, S.: Prediction of human mRNA donor and acceptor sites from the DNA sequence. J. Mol. Biol. 220, 49–65 (1991)
Article Google Scholar
Burge, C.: Identification of genes in human genomic DNA. Ph.D. thesis, Stanford University, Stanford (1997)
Google Scholar
Burge, C.B.: Modeling dependencies in pre-mRNA splicing signals. In: Salzberg, S.L., Searls, D.B., Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 109–128. Elsevier, Amsterdam (1998)
Google Scholar
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)
Article Google Scholar
Bühlmann, P., Wyner, A.J.: Variable length Markov chains. Ann. Stat. 27, 480–513 (1999)
Article MATH Google Scholar
Castelo, R., Guigó, R.: Splice site identification with idlBNs. Bioinformatics 20, 169–171 (2004)
Article Google Scholar
Castelo, R., Koc̆ka, T.: On inclusion-driven learning of Bayesian networks. J. Mach. Learn. Res. 4, 527–574 (2003)
MathSciNet Google Scholar
Cawley, S.: Statistical models for DNA sequencing and analysis. Ph.D. thesis, University of California, Berkeley (2000)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Claverie, J.-M., Sauvaget, I., Bougueleret, L.: k-Tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping. Methods Enzymol. 183, 237–252 (1990)
Google Scholar
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992)
MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Crooks, G.E., Hon, G., Chandonia, J.-M., Brenner, S.E.: WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004)
Article Google Scholar
Ding, C.H.Q., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)
Article Google Scholar
Ellrott, K., Yang, C., Sladek, F.M., Jiang, T.: Identifying transcription factor binding sites through Markov chain optimization. Bioinformatics 18, S100–S109 (2002)
Article Google Scholar
Fickett, J.W., Tung, C.-S.: Assessment of protein coding measures. Nucleic Acids Res. 20, 6441–6450 (1992)
Article Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Article Google Scholar
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16, 906–914 (2000)
Article Google Scholar
Gregory, T.R.: Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigma. Biol. Rev. 76, 65–101 (2001)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Article MATH Google Scholar
Ikemura, T.: Correlation between the abundance of Escherichia coli transfer RNAs and the occurence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151, 389–409 (1981)
Article Google Scholar
Jaakola, T.S., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. Proc. Int. Conf. Intell. Syst. Mol. Biol. 7, 149–158 (1999)
Google Scholar
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)
Article MATH MathSciNet Google Scholar
Jaynes, E.T.: Information theory and statistical mechanics II. In: Ford, K. (ed.) Statistical Physics, pp. 181–218. Benjamin, New York (1963)
Google Scholar
Koc̆ka, T., Castelo, R.: Improved learning of Bayesian networks. In: Proceedings of Uncertainty in Artificial Intelligence, pp. 269–276 (2001)
Google Scholar
Kozak, M.: Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986)
Article Google Scholar
Kulp, D., Haussler, D., Reese, M.G., Eeckman, F.H.: A generalized hidden Markov model for the recognition of human genes in DNA. Proc. Int. Conf. Intell. Syst. Mol. Biol. 4, 134–142 (1996)
Google Scholar
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
Article Google Scholar
Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)
Article Google Scholar
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10, 857–868 (2003)
Article Google Scholar
Lukashin, A.V., Borodvsky, M.: GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26, 1107–1115 (1998)
Article Google Scholar
McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York (2004)
MATH Google Scholar
Mercer, J.: Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. R. Soc. Lond. A 209, 415–446 (1909)
Google Scholar
Munch, K., Krogh, A.: Automatic generation of gene finders for euakryotic species. BMC Bioinform. 7, 263–274 (2006)
Article Google Scholar
Noble, W.S.: Support vector machine applications in computational biology. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, pp. 1–31. MIT Press, London (2004)
Google Scholar
Ohler, U., Harbeck, S., Niemann, H., Nöth, E., Reese, M.G.: Interpolated Markov chains for eukaryotic promoter recognition. Bioinformatics 15, 362–369 (1999)
Article Google Scholar
Ohno, S.: So much “junk” DNA in our genome. Brookhaven Symp. Biol. 23, 366–370 (1972)
Google Scholar
Oliver, J.L., Bernaola-Galván, P., Carpena, P., Román-Roldán, R.: Isochore chromosome maps of eukaryotic genomes. Gene 276, 47–56 (2001)
Article Google Scholar
Pavlidis, P., Furey, T.S., Liberto, M., Haussler, D., Grundy, W.N.: Promoter region-based classification of genes. In: Altman, R.B., Dunker, A.K., Hunter, L., Lauderdale, K., Kelin, T.E. (eds.) Pacific Symposium of Biocomputing, pp. 151–163. World Scientific, Singapore (2001)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
Google Scholar
Perna, N.T., Plunkett, G., Burland, V., Mau, B., Glasner, J.D., Rose, D.J., Mayhew, G.F., Evans, P.S., Gregor, J., Kirkpatrick, H.A., Pósfai, G., Hackett, J., Klink, S., Boutin, A., Shao, Y., Miller, L., Grotbeck, E.J., Davis, N.W., Lim, A., Dimalanta, E.T., Potamousis, K.D., Apodaca, J., Anantharaman, T.S., Lin, J., Yen, G., Schwartz, D.C., Welch, R.A., Blattner, F.R.: Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409, 529–533 (2001)
Article Google Scholar
Reese, M.G., Eeckman, F.H., Kulp, D., Haussler, D.: Improved splice site detection in genie. J. Comput. Biol. 4, 311–323 (1997)
Article Google Scholar
Rissanen, J.: A universal data compression system. IEEE Trans. Inf. Theory 29, 656–664 (1983)
Article MATH MathSciNet Google Scholar
Rätsch, G., Sonnenburg, S.: Accurate splice site detection for Caenorhabditis elegans. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology, pp. 277–298. MIT Press, London (2004)
Google Scholar
Schneider, T.D., Stephens, R.M.: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990)
Article Google Scholar
Schukat-Talamazzini, E.G., Gallwitz, F., Harbeck, S., Warnke, V.: Rational interpolation of maximum likelihood predictors in stochastic language modeling. In: Proceedings of Eurospeech’97, pp. 2731–2734. Rhodes, Greece (1997)
Google Scholar
Sharp, P.M., Li, W.H.: The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295 (1987)
Article Google Scholar
Shine, J., Dalgarno, L.: Determinant of cistron specificity in bacterial ribosomes. Nature 254, 34–38 (1975)
Article Google Scholar
Snyder, E.E., Stormo, G.D.: Identification of protein coding regions in genomic DNA. J. Mol. Biol. 248, 1–18 (1995)
Article Google Scholar
Solovyev, V.V., Salamov, A.A., Lawrence, C.B.: Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res. 22, 5156–5163 (1994)
Article Google Scholar
Solovyev, V.V., Salamov, A.A., Lawrence, C.B.: 82: identification of human gene structure using linear discriminant functions and dynamic programming. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 367–375 (1995)
Google Scholar
Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 12, 505–519 (1984)
Article Google Scholar
Staden, R., McLachlan, A.D.: Codon preference and its use in identifying protein coding regions in long DNA sequences. Nucleic Acids Res. 10, 141–156 (1982)
Article Google Scholar
Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. Neural Comput. 14, 2397–2414 (2002)
Article MATH Google Scholar
Wright, F.: The ‘effective number of codons’ used in a gene. Gene 87, 23–29 (1990)
Article Google Scholar
Xu, Y., Mural, R.J., Einstein, J.R., Shah, M.B., Uberbacher, E.C.: GRAIL: a multi-agent neural network system for gene identification. Proc. IEEE 84, 1544–1552 (1996)
Article Google Scholar
Xu, Y., Uberbacher, E.C.: Computational gene prediction using neural networks and similarity search. In: Salzberg, S.L., Searls, D.B., Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 109–128. Elsevier, Amsterdam (1998)
Chapter Google Scholar
Yeo, G., Burge, C.B.: Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004)
Article Google Scholar
Zhao, X., Huang, H., Speed, T.P.: Finding short DNA motifs using permuted Markov models. J. Comput. Biol. 12, 894–906 (2005)
Article Google Scholar
Zhang, M.Q., Marr, T.G.: Weight array methods for splicing signal analysis. Comput. Appl. Biosci. 9, 499–509 (1993)
Google Scholar
Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., Müller, K.-R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16, 799–807 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Chalmers University of Technology, Gothenburg, Sweden
Marina Axelson-Fisk

Authors

Marina Axelson-Fisk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marina Axelson-Fisk .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Axelson-Fisk, M. (2015). Gene Structure Submodels. In: Comparative Gene Finding. Computational Biology, vol 20. Springer, London. https://doi.org/10.1007/978-1-4471-6693-1_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6693-1_5
Published: 14 April 2015
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6692-4
Online ISBN: 978-1-4471-6693-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics